spark任务没有开始执行

4uqofj5v 于 2021-05-22 发布在 Spark

关注(0)|答案(1)|浏览(422)

我在spark shell工作

--num-executors 15 
--driver-memory 15G 
--executor-memory 7G 
--executor-cores 8 
--conf spark.yarn.executor.memoryOverhead=2G 
--conf spark.sql.shuffle.partitions=500 
--conf spark.sql.autoBroadcastJoinThreshold=-1 
--conf spark.executor.memoryOverhead=800

作业被卡住了，不启动代码就是在270m的大数据集上执行带有过滤条件的交叉连接。我已经将大表270m和小表（100000）的分区增加到16000，我已经将其转换为广播变量
我为工作添加了spark ui，
所以我必须减少分区，增加执行器，知道吗
谢谢你的帮助。
![spark ui 1][1][spark ui 2][2][10小时后点燃ui 3][3]
状态：任务：7341/16936（16624失败）
检查容器错误日志

RM Home
NodeManager
Tools
Failed while trying to construct the redirect url to the log server. Log Server url may not be configured
java.lang.Exception: Unknown container. Container either has not started or has already completed or doesn't belong to this node at all.

[50每完成一个ui 1][4][50每完成一个ui 2][5][1]：https://i.stack.imgur.com/nqcys.png [2]: https://i.stack.imgur.com/s2vwl.png [3]: https://i.stack.imgur.com/81fun.png [4]: https://i.stack.imgur.com/h5mta.png [5]: https://i.stack.imgur.com/ydfkf.png

scala apache-spark spark-ui task

来源：https://stackoverflow.com/questions/64358713/spark-tasks-not-starting-to-execute

1条答案

按热度按时间

ufj5ltwl1#

如果您能提到您的集群配置，那么它将是有帮助的。
但既然你添加了广播的小表1000是可行的，但100000是不可能的，你需要调整你的内存配置。
根据你的配置，我假设你有： 15 * 7 = 105GB 记忆。
你可以试试 --num-executors 7 --executor-memory 15 这将为每个执行器提供更多内存来保存广播变量。请调整 --executor-cores 相应地进行适当利用

赞(0）回复(0）举报 2021-05-23

我来回答

spark任务没有开始执行

1条答案

相关问题

热门标签

最新问答