我正面临着这个问题,我的Spark流作业运行几天后不断失败,错误如下:
AM Container for appattempt_1610108774021_0354_000001 exited with exitCode: -104
Failing this attempt.Diagnostics: Container [pid=31537,containerID=container_1610108774021_0354_01_000001] is running beyond physical memory limits. Current usage: 5.8 GB of 5.5 GB physical memory used; 8.0 GB of 27.3 GB virtual memory used. Killing container.
Dump of the process-tree for container_1610108774021_0354_01_000001 :
|- PID PPID PGRPID SESSID CMD_NAME USER_MODE_TIME(MILLIS) SYSTEM_TIME(MILLIS) VMEM_USAGE(BYTES) RSSMEM_USAGE(PAGES) FULL_CMD_LINE
|- 31742 31537 31537 31537 (java) 1583676 58530 8499392512 1507368 /usr/lib/jvm/java-openjdk/bin/java -server -Xmx5078m -
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
spark提交:
spark-submit --name DWH-CDC-commonJob --deploy-mode cluster --master yarn --conf spark.sql.shuffle.partitions=10 --conf spark.eventLog.enabled=false --conf spark.sql.caseSensitive=true --conf spark.driver.memory=5078M --class com.aos.Loader --jars file:////home/hadoop/lib/* --executor-memory 5000M --conf "spark.alert.duration=4" --conf spark.dynamicAllocation.enabled=false --num-executors 3 --files /home/hadoop/log4j.properties,/home/hadoop/application.conf --conf "spark.driver.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" --conf "spark.executor.extraJavaOptions=-Dlog4j.configuration=file:log4j.properties" streams_2.11-1.0.jar application.conf
已经尝试增加spark.executor.memoryOverhead,但几天后失败了,我想知道我们如何才能到达它可以在没有任何中断的情况下运行的数字。或者有没有其他我遗漏的配置。Spark 2.4版本aws EMR:5.23 scala:2.11.12两个数据节点(vCPU 4,每个16 GB ram)。
1条答案
按热度按时间col17t5w1#
NM正在扼杀你的工作,检查节点的hdfs磁盘使用情况。你是在处理大量的数据吗?还要检查你是否在用容器日志轰炸hdfs。还要检查这是动态分配还是静态分配的一部分。