将dataframe转换为csv抛出错误pyspark

yx2lnoni  于 2021-05-27  发布在  Spark
关注(0)|答案(0)|浏览(303)

我有大约7gb记录的巨大Dataframe。我试图得到的Dataframe计数,并下载它作为csv两个结果在下面的错误。有没有其他方法下载Dataframe而不需要多个分区

  1. print(df.count())
  2. df.coalesce(1).write.option("header", "true").csv('/user/ABC/Output.csv')
  3. Error:
  4. java.io.IOException: Stream is corrupted
  5. at net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:202)
  6. at net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:228)
  7. at net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:157)
  8. at org.apache.spark.io.ReadAheadInputStream$1.run(ReadAheadInputStream.java:168)
  9. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  10. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  11. at java.lang.Thread.run(Thread.java:748)
  12. 20/05/26 18:15:44 ERROR scheduler.TaskSetManager: Task 8 in stage 360.0 failed 1 times; aborting job
  13. [Stage 360:=======> (8 + 1) / 60]
  14. Py4JJavaError: An error occurred while calling o18867.count.
  15. : org.apache.spark.SparkException: Job aborted due to stage failure: Task 8 in stage 360.0 failed 1 times, most recent failure: Lost task 8.0 in stage 360.0 (TID 13986, localhost, executor driver): java.io.IOException: Stream is corrupted
  16. at net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:202)
  17. at net.jpountz.lz4.LZ4BlockInputStream.refill(LZ4BlockInputStream.java:228)
  18. at net.jpountz.lz4.LZ4BlockInputStream.read(LZ4BlockInputStream.java:157)
  19. at org.apache.spark.io.ReadAheadInputStream$1.run(ReadAheadInputStream.java:168)
  20. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  21. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  22. at java.lang.Thread.run(Thread.java:748)

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题