如何将参数传递给spark中mappartitions的用户定义函数？

ttp71kqs 于 2021-06-02 发布在 Hadoop

关注(0)|答案(1)|浏览(117)

在spark中，可以使用用户定义的函数 mapPartitions . 现在我的问题是我怎样才能把一个论点传给它。例如，现在我有这样的东西，叫做using rdd.mapPartitions(userdefinedFunc) .

def userdefinedFunc(iter: Iterator[(Long, Array[SAMRecord])]) : Iterator[(Long, Long)] = 
{
  val res = scala.collection.mutable.ArrayBuffer.empty[(Long, Long)]

  // Code here

  res.iterator
}

但是，我还希望有一个常量作为该用户定义函数的参数，例如，它如下所示。

def userdefinedFunc(iter: Iterator[(Long, Array[SAMRecord])], someConstant: Long) : 
 Iterator[(Long, Long)] = 
{
  val res = scala.collection.mutable.ArrayBuffer.empty[(Long, Long)]

  // Code here

  res.iterator
}

现在我如何用 mapPartitions . 如果我使用 rdd.mapPartitions(userdefinedFunc(someConstant)) .

Java hadoop yarn scala apache-spark

来源：https://stackoverflow.com/questions/31614953/how-to-pass-an-argument-to-a-user-defined-function-for-mappartitions-in-spark

1条答案

按热度按时间

pengsaosao1#

使用currying函数，如：

def userdefinedFunc(someConstant: Long)(iter: Iterator[(Long, Array[SAMRecord])]): Iterator[(Long, Long)]

那么 userdefinedFunc(someConstant) 将是类型为的函数 (iter: Iterator[(Long, Array[SAMRecord])]) => Iterator[(Long, Long)] 可以传递给mappartitions的。

赞(0）回复(0）举报 2021-06-02

我来回答

如何将参数传递给spark中mappartitions的用户定义函数？

1条答案

相关问题

热门标签

最新问答