pyudf运行时错误：返回的列数与指定的架构不匹配

biswetbf 于 2021-05-27 发布在 Spark

关注(0)|答案(0)|浏览(274)

我有Pandas自定义项定义如下

schema2 = StructType([   StructField('sensorid', IntegerType(), True),
    StructField('confidence', DoubleType(), True)]) 

@pandas_udf(schema2,  PandasUDFType.GROUPED_MAP)   
def PreProcess(Indf):   
    confidence=1  
    sensor=Indf.iloc[0,0]   
    df = pd.DataFrame(columns=['sensorid','confidence'])  
    df['sensorid']=[sensor]   
    df['confidence']=[0]   
    return df

然后我将一个带有3列的sparkDataframe传递到这个udf中

results.groupby("sensorid").apply(PreProcess)

results:
+--------+---------------+---------------+
|sensorid|sensortimestamp|calculatedvalue|
+--------+---------------+---------------+
|  397332|     1596518086|          -39.0|
|  397332|     1596525586|          -31.0|

但我一直有个错误：

RuntimeError: Number of columns of the returned pandas.DataFrame doesn't match specified schema.Expected: 3 Actual: 4

我可以告诉什么错误是试图说，但我不明白这个错误是如何弹出。我想我正在返回结构中指定的Dataframe的正确的2列

python apache-spark pandas

来源：https://stackoverflow.com/questions/63403001/pyspark-pandas-udf-runtimeerror-number-of-columns-of-the-returned-doesnt-match

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

pyudf运行时错误：返回的列数与指定的架构不匹配

暂无答案！

相关问题

热门标签

最新问答