spark-java堆空间问题-executorlostfailure-container退出，状态为143

cbeh67ev 于 2021-05-29 发布在 Hadoop

关注(0)|答案(0)|浏览(259)

我正在读取长度超过100k字节的字符串，并根据宽度拆分列。我有接近16k列，我从上面的字符串根据宽度分裂。
但当我在Parquet地板上写的时候，我使用了下面的代码

rdd1=spark.sparkContext.textfile("file1")

{ var now=0
 { val collector= new array[String] (ColLenghth.length) 
 val recordlength=line.length
for (k<- 0 to colLength.length -1)
 { collector(k) = line.substring(now,now+colLength(k))
 now =now+colLength(k)
 }
 collector.toSeq}

StringArray=rdd1.map(SubstrSting(_,ColLengthSeq))

# here ColLengthSeq is read from another schema file which is column lengths

StringArray.toDF("StringCol").select(0 until ColCount).map(j=>$"StringCol"(j) as column_seq(j):_*).write.mode("overwrite").parquet("c"\home\")

这里colcount=16000，column_seq是具有16k列名的seq（string）。
我用16gb的执行器内存和20个执行器运行这个程序。
文件大小为4gb。
我得到的错误是

Lost task 113.0 in stage 0.0 (TID 461, gsta32512.foo.com): ExecutorLostFailure (executor 28 exited caused by one of the running tasks) Reason: 
Container marked as failed: 
container_e05_1472185459203_255575_01_000183 on host: gsta32512.foo.com. Exit status: 143. Diagnostics: 
Container killed on request. Exit code is 143
Container exited with a non-zero exit code 143
Killed by external signal

当我查看ui的状态时


# java.lang.outofmemoryerror java heap space

# java.lang.outofmemoryerror gc overhead limit exceeded

请指导上述代码的性能调整和参数优化

hadoop yarn scala apache-spark

来源：https://stackoverflow.com/questions/51118204/spark-java-heap-space-issue-executorlostfailure-container-exited-with-stat

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

spark-java堆空间问题-executorlostfailure-container退出，状态为143

暂无答案！

相关问题

热门标签

最新问答