Hadoop Spark SQL插入失败

jmo0nnb3  于 2022-12-17  发布在  Hadoop
关注(0)|答案(1)|浏览(315)

我试图将大约1300万行的内容插入到新表中,但出现以下错误:

  1. 22/12/09 19:33:56 ERROR Utils: Aborting task
  2. java.lang.AssertionError: assertion failed: Created file counter 11 is beyond max value 10
  3. at scala.Predef$.assert(Predef.scala:223)
  4. at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.$anonfun$increaseCreatedFileAndCheck$1(FileFormatDataWriter.scala:191)
  5. at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
  6. at scala.Option.foreach(Option.scala:407)
  7. at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.increaseCreatedFileAndCheck(FileFormatDataWriter.scala:188)
  8. at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:277)
  9. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:280)
  10. at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
  11. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:288)
  12. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:211)
  13. at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  14. at org.apache.spark.scheduler.Task.run(Task.scala:131)
  15. at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
  16. at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
  17. at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
  18. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  19. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  20. at java.lang.Thread.run(Thread.java:748)
  21. 22/12/09 19:33:57 ERROR FileFormatWriter: Job job_202212091917352650741377131539872_0020 aborted.
  22. 22/12/09 19:33:57 ERROR Executor: Exception in task 0.1 in stage 20.0 (TID 26337)
  23. org.apache.spark.SparkException: Task failed while writing rows.
  24. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:298)
  25. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$write$15(FileFormatWriter.scala:211)
  26. at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:90)
  27. at org.apache.spark.scheduler.Task.run(Task.scala:131)
  28. at org.apache.spark.executor.Executor$TaskRunner.$anonfun$run$3(Executor.scala:498)
  29. at org.apache.spark.util.Utils$.tryWithSafeFinally(Utils.scala:1439)
  30. at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:501)
  31. at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
  32. at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
  33. at java.lang.Thread.run(Thread.java:748)
  34. Caused by: java.lang.AssertionError: assertion failed: Created file counter 11 is beyond max value 10
  35. at scala.Predef$.assert(Predef.scala:223)
  36. at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.$anonfun$increaseCreatedFileAndCheck$1(FileFormatDataWriter.scala:191)
  37. at scala.runtime.java8.JFunction1$mcVI$sp.apply(JFunction1$mcVI$sp.java:23)
  38. at scala.Option.foreach(Option.scala:407)
  39. at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.increaseCreatedFileAndCheck(FileFormatDataWriter.scala:188)
  40. at org.apache.spark.sql.execution.datasources.DynamicPartitionDataWriter.write(FileFormatDataWriter.scala:277)
  41. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.$anonfun$executeTask$1(FileFormatWriter.scala:280)
  42. at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1473)
  43. at org.apache.spark.sql.execution.datasources.FileFormatWriter$.executeTask(FileFormatWriter.scala:288)

插入操作如下所示:

  1. insert overwrite table fake_table_txt partition(partition_name)
  2. select id, name, type, description from ( inner query )

我是一个Hadoop的初学者,我不知道是什么原因造成的。有人能给予我一些指导吗?

gmxoilav

gmxoilav1#

经过一番挣扎之后,我被告知增加属性“每个任务的文件数”会起到作用。

  1. set spark.sql.maxCreatedFilesPerTask = 15;

之前默认为10。

相关问题