我正在尝试将pandasDataframe转换为pysparkDataframe,并出现以下与pyarrow相关的错误:
import pandas as pd
import numpy as np
data = np.random.rand(1000000, 10)
pdf = pd.DataFrame(data, columns=list("abcdefghij"))
df = spark.createDataFrame(pdf)
/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py:714: UserWarning: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true; however, failed by the reason below:
'JavaPackage' object is not callable
Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' is set to true.
我尝试了不同版本的pyarrow(0.10.0、0.14.1、0.15.1和更多),但结果相同。如何调试?
1条答案
按热度按时间vc9ivgsu1#
我也遇到了同样的问题,将集群设置更改为emr-5.30.1,将arrow版本更改为0.14.1,解决了这个问题