将Dataframe保存到配置单元表中的有效方法是什么？

lhcgjxsq 于 2021-06-26 发布在 Hive

关注(0)|答案(0)|浏览(223)

我们正在从greenplum迁移到hdfs。数据通过巨大的etl从源表到greenplum，从greenplum，我们只是使用spark将数据转储到hdfs中。所以我尝试读取一个gp表，并使用spark将其加载到hdfs上的配置单元表中。
我有一个从gp表读取的Dataframe，如下所示：

val yearDF    = spark.read.format("jdbc").option("url", connectionUrl)
                            .option("dbtable", s"(${execQuery}) as year2017")
                            .option("user", devUserName)
                            .option("password", devPassword)
                            .option("numPartitions",10)
                            .load()

将Dataframe保存到配置单元表中有不同的选项。
第一种方法：

yearDf.write().mode("overwrite").partitionBy("source_system_name","period_year","period_num").saveAsTable("schemaName.tableName");

第二种方法：

myDf.createOrReplaceTempView("yearData");
 spark.sql("insert into schema.table partition("source_system_name","period_year","period_num") select * from yearData");

上述方法的利弊是什么？我们在生产中有巨大的表，通常需要花费大量时间将数据加载到配置单元中。有人能告诉我哪种方法是将Dataframe中的数据保存到配置单元表的有效且推荐的方法吗？

Hive apache-spark apache-spark-sql

来源：https://stackoverflow.com/questions/52172032/what-is-the-efficient-way-to-save-a-dataframe-into-a-hive-table

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

将Dataframe保存到配置单元表中的有效方法是什么？

暂无答案！

相关问题

热门标签

最新问答