scala广播+自定义项

nue99wik  于 2021-05-29  发布在  Spark
关注(0)|答案(1)|浏览(526)

我试图广播一个列表,并将广播变量传递给udf(scala代码在单独的文件中)。但是面对问题。

val Lookup_BroadCast = SC.broadcast(lookup_data)

使用3个参数创建自定义项

val Call_Sub_Pgm = udf(foo(_: String, Lookup_BroadCast: org.apache.spark.broadcast.Broadcast[List[String]], Trace: String))

使用“withcolumn”调用udf

Out_DF = Out_DF.withColumn("Col-1", Call_Sub_Pgm(col(Col-1),Lookup_BroadCast,lit(Trace)))

我得到上述代码的编译错误-“找到广播变量,必需的sql列”
如果我从上面删除“lookup\u broadcast”变量

Out_DF = Out_DF.withColumn("Col-1", Call_Sub_Pgm(col(Col-1),Lookup_BroadCast,lit(Trace)))

然后我得到以下错误:

java.lang.ClassCastException: org.spark.masking.ExtractData$$anonfun$7 cannot be cast to scala.Function0
xuo3flqw

xuo3flqw1#

可以为函数创建可序列化 Package 类,在构造函数中使用广播:

class Wrapper(Lookup_BroadCast: Broadcast[List[String]]) extends Serializable {
  def foo(v: String, s: String): String = {
      // usage example
    Lookup_BroadCast.value.head
  }
}

用法如下:

val wrapper = new Wrapper(Lookup_BroadCast)
val Call_Sub_Pgm = udf(wrapper.foo(_: String, _: String))

相关问题