简单rdd.count()操作的java.lang.outofmemoryerror

fzwojiic  于 2021-05-30  发布在  Hadoop
关注(0)|答案(2)|浏览(490)

我有很多麻烦得到一个简单的计数操作工作大约55个文件的hdfs和总共1b记录。spark shell和pyspark都会因oom错误而失败。我使用的是yarn、mapr、spark 1.3.1和hdfs 2.4.1(它在本地模式下也会失败。)我试着遵循调优和配置建议,向执行器抛出越来越多的内存。我的配置是

conf = (SparkConf()
        .setMaster("yarn-client")
        .setAppName("pyspark-testing")
        .set("spark.executor.memory", "6g")
        .set("spark.driver.memory", "6g") 
        .set("spark.executor.instances", 20)
        .set("spark.yarn.executor.memoryOverhead", "1024")
        .set("spark.yarn.driver.memoryOverhead", "1024")
        .set("spark.yarn.am.memoryOverhead", "1024")
        )
sc = SparkContext(conf=conf)
sc.textFile('/data/on/hdfs/*.csv').count()  # fails every time

这项工作被分成893个任务,大约50个任务成功完成后,许多任务开始失败。我懂了 ExecutorLostFailurestderr 应用程序的名称。在浏览executor日志时,我看到如下错误:

15/06/24 16:54:07 ERROR util.Utils: Uncaught exception in thread stdout writer for /work/analytics2/analytics/python/envs/santon/bin/python
java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
    at java.nio.CharBuffer.allocate(CharBuffer.java:331)
    at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
    at org.apache.hadoop.io.Text.decode(Text.java:406)
    at org.apache.hadoop.io.Text.decode(Text.java:383)
    at org.apache.hadoop.io.Text.toString(Text.java:281)
    at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:558)
    at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:558)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:379)
    at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:242)
    at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:204)
    at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:204)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1550)
    at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:203)
15/06/24 16:54:07 ERROR util.SparkUncaughtExceptionHandler: Uncaught exception in thread Thread[stdout writer for /work/analytics2/analytics/python/envs/santon/bin/python,5,main]
java.lang.OutOfMemoryError: Java heap space
    at java.nio.HeapCharBuffer.<init>(HeapCharBuffer.java:57)
    at java.nio.CharBuffer.allocate(CharBuffer.java:331)
    at java.nio.charset.CharsetDecoder.decode(CharsetDecoder.java:792)
    at org.apache.hadoop.io.Text.decode(Text.java:406)
    at org.apache.hadoop.io.Text.decode(Text.java:383)
    at org.apache.hadoop.io.Text.toString(Text.java:281)
    at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:558)
    at org.apache.spark.SparkContext$$anonfun$textFile$1.apply(SparkContext.scala:558)
    at scala.collection.Iterator$$anon$11.next(Iterator.scala:328)
    at org.apache.spark.api.python.PythonRDD$.writeIteratorToStream(PythonRDD.scala:379)
    at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply$mcV$sp(PythonRDD.scala:242)
    at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:204)
    at org.apache.spark.api.python.PythonRDD$WriterThread$$anonfun$run$1.apply(PythonRDD.scala:204)
    at org.apache.spark.util.Utils$.logUncaughtExceptions(Utils.scala:1550)
    at org.apache.spark.api.python.PythonRDD$WriterThread.run(PythonRDD.scala:203)
15/06/24 16:54:07 ERROR executor.CoarseGrainedExecutorBackend: RECEIVED SIGNAL 15: SIGTERM

stdout :


# java.lang.OutOfMemoryError: Java heap space

# -XX:OnOutOfMemoryError="kill %p"

# Executing /bin/sh -c "kill 16490"...

总的来说,我认为我理解oom错误和故障排除,但我在概念上还停留在这里。这只是一个简单的计数。我不明白当执行者有~3g堆时,java堆怎么可能会溢出。以前有没有人碰到过这个问题,或者有什么建议?是不是有什么秘密可以让我们了解这个问题?
更新:
我还注意到,通过指定并行性(例如 sc.textFile(..., 1000) )对于相同数量的任务(893),则创建的作业有920个任务,除了最后一个任务外,其余任务都已完成且没有错误。最后一个任务无限期地挂起。这似乎非常奇怪!

bkhjykvo

bkhjykvo1#

事实证明,我遇到的问题实际上与一个已损坏的文件有关。运行一个简单的 cat 或者 wc -l 会导致终端挂起。

tf7tbtn2

tf7tbtn22#

尝试在控制台上增加java堆大小,如下所示

export JAVA_OPTS="-Xms512m -Xmx5g"

您可以根据数据和内存大小更改值, -Xms 表示最小内存和 -Xmx 表示最大尺寸。希望对你有帮助。

相关问题