picklingerror:无法序列化对象：typeerror:无法pickle'\u thread.\u local'对象

oxcyiej7 于 2021-07-14 发布在 Spark

关注(0)|答案(0)|浏览(219)

**结束。**此问题需要详细的调试信息。它目前不接受答案。
**想改进这个问题吗？**更新问题，使其成为堆栈溢出的主题。

7天前关门了。
改进这个问题
我在使用pandas udf和使用spark data frame调用时遇到这个错误。以下代码仅供参考。。还附上了样本数据集。我正在尝试应用rule engine包将规则应用到我的数据集，以便使用udf和spark dataframe过滤数据。
newdata.csv包含以下内容：
customerid，customername，datetime，amount，折扣，会员1001，arun，12-02-2020 01:012465.22,10%，true 1005，barath，13-07-2020 12:158399.34,5%，true 1003，charle，18-07-2020 20:101234.88,3%，false 1004，john，10-07-2020 13:101690,1%，true


# importing Dependencies

import rule_engine
import csv
import pandas as pd
import findspark
from pyspark.sql import *
from pyspark.sql.functions import col, pandas_udf
from pyspark.sql.types import LongType

findspark.init()

# Creating rules using Rule engine

rule = rule_engine.Rule('CustomerId>1001')

spark = SparkSession.builder.appName("practice").getOrCreate()
spark.conf.set("spark.sql.execution.arrow.pyspark.enabled","true")

# Declare the function and create the UDF

@pandas_udf("string")
def datafilter(CustomerId:pd.Series) ->pd.Series:
    #load=json.loads(CustomerId.to_json())
    return pd.Series(rule.filter(CustomerId))

# Create a Spark DataFrame

sp_df = spark.createDataFrame(pd.read_csv('NewData.csv'))
sp_df.show()

# Execute function as a Spark vectorized UDF

df2=sp_df.select(datafilter("CustomerId"))
df2.show()

来源：https://stackoverflow.com/questions/67120345/picklingerror-could-not-serialize-object-typeerror-cannot-pickle-thread-lo

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

picklingerror:无法序列化对象：typeerror:无法pickle'\u thread.\u local'对象

暂无答案！

相关问题

热门标签

最新问答