使用hdfs put vs spark将本地文件加载到HDFS

sxpgvts3 于 2022-12-09 发布在 HDFS

关注(0)|答案(1)|浏览(254)

Usecase is to load local file into HDFS. Below two are approaches to do the same , Please suggest which one is efficient.
Approach1: Using hdfs put command

hadoop fs -put /local/filepath/file.parquet   /user/table_nm/

Approach2: Using Spark .

spark.read.parquet("/local/filepath/file.parquet  ").createOrReplaceTempView("temp")
spark.sql(s"insert into table table_nm select * from temp")

Note:

Source File can be in any format
No transformations needed for file loading .
table_nm is an hive external table pointing to /user/table_nm/

hdfs

来源：https://stackoverflow.com/questions/68818282/loading-local-file-into-hdfs-using-hdfs-put-vs-spark

1条答案

按热度按时间

ffx8fchx1#

假设它们已经是本地构建的.parquet文件，使用-put会更快，因为没有启动Spark App的开销。
如果有很多文件，那么通过put要做的工作就更少了。

赞(0）回复(0）举报 2022-12-09

我来回答

使用hdfs put vs spark将本地文件加载到HDFS

1条答案

相关问题

热门标签

最新问答