pyspark在aws emr上出现pandas和pyarrow错误:“javapackage”对象不可调用

rdlzhqv9  于 2021-05-27  发布在  Spark
关注(0)|答案(1)|浏览(491)

我正在尝试将pandasDataframe转换为pysparkDataframe,并出现以下与pyarrow相关的错误:

import pandas as pd
import numpy as np

data = np.random.rand(1000000, 10)
pdf = pd.DataFrame(data, columns=list("abcdefghij"))
df = spark.createDataFrame(pdf)
/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py:714: UserWarning: createDataFrame attempted Arrow optimization because 'spark.sql.execution.arrow.enabled' is set to true; however, failed by the reason below:
  'JavaPackage' object is not callable
Attempting non-optimization as 'spark.sql.execution.arrow.fallback.enabled' is set to true.

我尝试了不同版本的pyarrow(0.10.0、0.14.1、0.15.1和更多),但结果相同。如何调试?

vc9ivgsu

vc9ivgsu1#

我也遇到了同样的问题,将集群设置更改为emr-5.30.1,将arrow版本更改为0.14.1,解决了这个问题

相关问题