不使用UDF Spark Scala在 Dataframe 上定位函数用法

wqnecbli 于 2022-12-13 发布在 Scala

关注(0)|答案(1)|浏览(150)

我很好奇，为什么Spark Scala不能在 Dataframe 上使用：

df.withColumn("answer", locate(df("search_string"), col("hit_songs"), pos=1))

它使用UDF，但不是上面所述的那样。列与字符串方面。看起来很笨拙，缺乏方面。如何将列转换为字符串，以便传递来定位需要字符串的位置。
df("search_string")允许生成一个字符串是我的理解。
但得到的错误是：

command-679436134936072:15: error: type mismatch;
 found   : org.apache.spark.sql.Column
 required: String
df.withColumn("answer", locate(df("search_string"), col("hit_songs"), pos=1))

scala

来源：https://stackoverflow.com/questions/72089887/locate-function-usage-on-dataframe-without-using-udf-spark-scala

1条答案

按热度按时间

thtygnil1#

了解问题所在
我不确定您使用的是哪个版本的Spark，但是locate方法在Spark 3.3.1（当前最新版本）和Spark 2.4.5（运行在我的本地Spark shell上的版本）上都有以下函数签名。
此函数签名如下：

def locate(substr: String, str: Column, pos: Int): Column

所以substr不能是Column，它应该是String。在你的例子中，你使用的是df("search_string")。这实际上调用了apply方法，函数签名如下：

def apply(colName: String): Column

因此，由于locate函数需要一个String，因此您遇到问题是有道理的。

尝试修复您的问题

如果我没理解错的话，你希望能够在没有UDF的情况下，从一列的字符串中找到另一列的子字符串。你可以在Dataset上使用map来实现这一点。类似如下：

import spark.implicits._

case class MyTest (A:String, B: String)

val df = Seq(
  MyTest("with", "potatoes with meat"),
  MyTest("with", "pasta with cream"),
  MyTest("food", "tasty food"),
  MyTest("notInThere", "don't forget some nice drinks")
).toDF("A", "B").as[MyTest]

val output = df.map{
  case MyTest(a,b) => (a, b, b indexOf a)
}
output.show(false)                                                                                                                                                                                                                                                       
+----------+-----------------------------+---+                                                                                                                                                                                                                                  
|_1        |_2                           |_3 |                                                                                                                                                                                                                                  
+----------+-----------------------------+---+                                                                                                                                                                                                                                  
|with      |potatoes with meat           |9  |                                                                                                                                                                                                                                  
|with      |pasta with cream             |6  |                                                                                                                                                                                                                                  
|food      |tasty food                   |6  |                                                                                                                                                                                                                                  
|notInThere|don't forget some nice drinks|-1 |                                                                                                                                                                                                                                  
+----------+-----------------------------+---+

一旦进入了强类型Dataset的map操作，Scala语言就可以随意使用了。
希望这对你有帮助！

赞(0）回复(0）举报 2022-12-13

我来回答

不使用UDF Spark Scala在 Dataframe 上定位函数用法

1条答案

尝试修复您的问题

相关问题

热门标签

最新问答