我试图注册我的自定义项函数,并希望在我的sparksql查询中使用它,但无法注册我的自定义项imgetbelow错误。
val squared = (s: Column) => {
concat(substring(s,4,2),year(to_date(from_unixtime(unix_timestamp(s,"dd-MM-yyyy")))))
}
squared: org.apache.spark.sql.Column => org.apache.spark.sql.Column = <function1>
scala> sqlContext.udf.register("dc",squared)
java.lang.UnsupportedOperationException: Schema for type org.apache.spark.sql.Column is not supported
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:733)
at org.apache.spark.sql.catalyst.ScalaReflection$.schemaFor(ScalaReflection.scala:671)
at org.apache.spark.sql.UDFRegistration.register(UDFRegistration.scala:143)
... 48 elided
我试图将列更改为字符串,但出现以下错误。
val squared = (s: String) => {
| concat(substring(s,4,2),year(to_date(from_unixtime(unix_timestamp(s,"dd-MM-yyyy")))))
| }
<console>:28: error: type mismatch;
found : String
required: org.apache.spark.sql.Column
concat(substring(s,4,2),year(to_date(from_unixtime(unix_timestamp(s,"dd-MM-yyyy")))))
can someone please guide me how should i implement this.
1条答案
按热度按时间oalqel3c1#
此包org.apache.spark.sql.functions.中的所有spark函数将无法在udf内部访问。
您可以使用普通的scala代码来获得相同的结果,而不是内置的spark函数。