我为几何体字段上的空间连接运行以下代码:
val coverage = DimCoverageReader.apply(spark, params)
coverage.createOrReplaceTempView("dim_coverage")
val uniqueGeometries = spark.table(params.UniqueGeometriesTable)
uniqueGeometries.createOrReplaceTempView("unique_geometries")
spark
.sql(
"""select a.*, b.lac, b.cell_id
|from unique_geometries as a, dim_coverage as b
|where ST_Intersects(ST_GeomFromWKT(a.geo_wkt), ST_GeomFromWKT(b.geo_wkt))
|""".stripMargin)
生成的Dataframe稍后保存到orc表中:
Stage(spark,params).write
.format("orc")
.mode(SaveMode.Overwrite)
.saveAsTable(params.IntersectGeometriesTable)
我在执行过程中遇到这个错误:org.apache.spark.sparkexception:任务在写入行时失败
0/10/30 17:37:19 ERROR Executor: Exception in task 205.0 in stage 4.0 (TID 1219)
org.apache.spark.SparkException: Task failed while writing rows
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:270)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:189)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$write$1$$anonfun$apply$mcV$sp$1.apply(FileFormatWriter.scala:188)
at org.apache.spark.scheduler.ResultTask.runTask(ResultTask.scala:87)
at org.apache.spark.scheduler.Task.run(Task.scala:108)
at org.apache.spark.executor.Executor$TaskRunner.run(Executor.scala:338)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
at java.lang.Thread.run(Thread.java:748)
Caused by: java.lang.IllegalArgumentException: Column has wrong number of index entries found: 320 expected: 800
at org.apache.hadoop.hive.ql.io.orc.WriterImpl$TreeWriter.writeStripe(WriterImpl.java:803)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl$StructTreeWriter.writeStripe(WriterImpl.java:1742)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.flushStripe(WriterImpl.java:2133)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.checkMemory(WriterImpl.java:352)
at org.apache.hadoop.hive.ql.io.orc.MemoryManager.notifyWriters(MemoryManager.java:168)
at org.apache.hadoop.hive.ql.io.orc.MemoryManager.addedRow(MemoryManager.java:157)
at org.apache.hadoop.hive.ql.io.orc.WriterImpl.addRow(WriterImpl.java:2413)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:76)
at org.apache.hadoop.hive.ql.io.orc.OrcOutputFormat$OrcRecordWriter.write(OrcOutputFormat.java:55)
at org.apache.spark.sql.hive.orc.OrcOutputWriter.write(OrcFileFormat.scala:248)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$SingleDirectoryWriteTask.execute(FileFormatWriter.scala:325)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:256)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$$anonfun$org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask$3.apply(FileFormatWriter.scala:254)
at org.apache.spark.util.Utils$.tryWithSafeFinallyAndFailureCallbacks(Utils.scala:1371)
at org.apache.spark.sql.execution.datasources.FileFormatWriter$.org$apache$spark$sql$execution$datasources$FileFormatWriter$$executeTask(FileFormatWriter.scala:259)
... 8 more
这个问题的根本原因是什么?
1条答案
按热度按时间6qqygrtg1#
如果这和
format('parquet')
我猜你有某种结构类型或格式问题。你能加上printSchema
为了你的df?