为什么在spark中使用insertinto函数写入hive stage目录需要更多的时间？

q9yhzks0 于 2021-05-31 发布在 Hadoop

关注(0)|答案(0)|浏览(450)

我运行的是spark代码，它写入一个hive分区表。

df.write.mode(SaveMode.Overwrite).format("orc").insertInto("s**000h.test")

在内部，所有执行者都在向hive stage区域写入数据（.hive-staging\u hive\u 2020-03-30\u 13-47-16\u 727\u 5670185411499574661-1），与我将数据显式写入hdfs目录时相比，这需要更多的时间，如下所示。 df.write.mode(mode).format("orc").partitionBy("dept_id").save(tempPath) 900个分区的时差大约为1小时。
你能解释一下这种行为吗。

hadoop Hive apache-spark apache-spark-sql

来源：https://stackoverflow.com/questions/60932561/why-it-takes-more-time-to-write-into-hive-stage-directory-using-insertinto-funct

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

为什么在spark中使用insertinto函数写入hive stage目录需要更多的时间？

暂无答案！

相关问题

热门标签

最新问答