保存到parquet:indexerror:pop from an empty deque时发生spark错误

pgccezyw  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(1132)

我在使用aws emr服务运行带有Yarn的spark群集时遇到以下错误:

ERROR:root:Exception while sending command.
Traceback (most recent call last):
  File "/mnt/yarn/usercache/hadoop/appcache/application_1594292341949_0004/container_1594292341949_0004_01_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1159, in send_command
    raise Py4JNetworkError("Answer from Java side is empty")
py4j.protocol.Py4JNetworkError: Answer from Java side is empty

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "/mnt/yarn/usercache/hadoop/appcache/application_1594292341949_0004/container_1594292341949_0004_01_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 985, in send_command
    response = connection.send_command(command)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1594292341949_0004/container_1594292341949_0004_01_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1164, in send_command
    "Error while receiving", e, proto.ERROR_ON_RECEIVE)
py4j.protocol.Py4JNetworkError: Error while receiving
Traceback (most recent call last):
  File "process_ecommerce.py", line 131, in <module>
    cfg["partitions"]["info"]
  File "/mnt/yarn/usercache/hadoop/appcache/application_1594292341949_0004/container_1594292341949_0004_01_000001/__pyfiles__/spark_utils.py", line 10, in save_dataframe
    .parquet(path)
  File "/mnt/yarn/usercache/hadoop/appcache/application_1594292341949_0004/container_1594292341949_0004_01_000001/pyspark.zip/pyspark/sql/readwriter.py", line 844, in parquet
  File "/mnt/yarn/usercache/hadoop/appcache/application_1594292341949_0004/container_1594292341949_0004_01_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
  File "/mnt/yarn/usercache/hadoop/appcache/application_1594292341949_0004/container_1594292341949_0004_01_000001/pyspark.zip/pyspark/sql/utils.py", line 63, in deco
  File "/mnt/yarn/usercache/hadoop/appcache/application_1594292341949_0004/container_1594292341949_0004_01_000001/py4j-0.10.7-src.zip/py4j/protocol.py", line 336, in get_return_value
py4j.protocol.Py4JError: An error occurred while calling o343.parquet
ERROR:py4j.java_gateway:An error occurred while trying to connect to the Java server (127.0.0.1:36063)
Traceback (most recent call last):
  File "/mnt/yarn/usercache/hadoop/appcache/application_1594292341949_0004/container_1594292341949_0004_01_000001/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 929, in _get_connection
    connection = self.deque.pop()
IndexError: pop from an empty deque

我运行的集群有1个主节点和20个r5.2xlarge类型的从节点。它们每个都有8个CPU和64gb。spark的配置为:
20 gb执行器内存
30 gb执行器内存开销
每个执行器8个核心
每个任务1个cpu
如何解决此错误?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题