我正试图在azuredatabricks中编写一个函数。我想在函数中使用spark.sql。但看起来我无法将其用于工作节点。
def SEL_ID(value, index):
# some processing on value here
ans = spark.sql("SELECT id FROM table WHERE bin = index")
return ans
spark.udf.register("SEL_ID", SEL_ID)
我得到以下错误: PicklingError: Could not serialize object: Exception: It appears that you are attempting to reference SparkContext from a broadcast variable, action, or transformation. SparkContext can only be used on the driver, not in code that it run on workers. For more information, see SPARK-5063.
我有什么办法克服这个问题吗?我正在使用上述函数从另一个表中进行选择。
暂无答案!
目前还没有任何答案,快来回答吧!