在spark2.3的hdfs中编写分区表时遇到了一些问题。我认为问题来自partitionoverwritemode配置。有可能吗?保持覆盖模式如何解决?
spark.conf.set("spark.sql.sources.partitionOverwriteMode", "dynamic")
df \
.repartition(3) \
.write \
.format('orc') \
.partitionBy(['data_date_part']) \
.mode("overwrite") \
.option("compression",'zlib') \
.option("path", table_path+ '/' + table_name) \
.saveAsTable(scheme + '.' + table_name)
错误:
Caused by: org.apache.spark.SparkException: Job aborted due to stage failure: Task 8 in stage 3.0 failed 4 times, most recent failure: Lost task 8.3 in stage 3.0 Task failed while writing rows.
Caused by: org.apache.hadoop.ipc.RemoteException(org.apache.hadoop.fs.FileAlreadyExistsException): file/ for client already exists
暂无答案!
目前还没有任何答案,快来回答吧!