pyspark-java.lang.outofmemoryerror:写入csv文件时的java堆空间

2lpgd968  于 2021-05-27  发布在  Spark
关注(0)|答案(2)|浏览(369)

尝试使用以下代码写入csv文件时

  1. DF.coalesce(1).write.option("header","false").option("sep",",").option("escape",'"').option("ignoreTrailingWhiteSpace","false").option("ignoreLeadingWhiteSpace","false").mode("overwrite").csv(filename)

我得到下面的错误

  1. ileFormatWriter.scala:169)
  2. at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  3. at org.apache.spark.scheduler.Task.run(Task.scala:121)
  4. at org.apache.spark.executor.Executor$TaskRunner$$anonfun$10.apply(Executor.scala:408)
  5. at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1360)
  6. at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:414)
  7. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  8. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  9. ... 1 more
  10. Caused by: java.lang.OutOfMemoryError: Java heap space

有人能建议一个解决办法吗?

dwthyt8l

dwthyt8l1#

对我来说,添加下面的spark配置修复了这个问题

  1. spark = SparkSession.builder.master('local[*]').config("spark.driver.memory", "15g").appName('sl-app').getOrCreate()
6yjfywim

6yjfywim2#

尝试增加 executor.memory 在你的spark提交申请
像这样的

  1. spark-submit \
  2. --class org.apache.spark.examples.SparkPi \
  3. --master spark://207.184.161.138:7077 \
  4. --executor-memory 20G \
  5. --total-executor-cores 100 \
  6. /path/to/examples.jar \
  7. 1000

相关问题