我有一个Dataframe,我正试图将它保存为我的配置单元表。我已经尝试了所有可能的方法,但无法将其保存为hdp3.0中的表。我正在使用下面的代码。
var sparksession = SparkSession.builder()
.appName("appname")
.config("hive.exec.dynamic.partition", "true")
.config("hive.exec.dynamic.partition.mode", "nonstrict")
.config("spark.sql.sources.maxConcurrentWrites","1")
.config("hive.support.concurrency", "true")
.config("parquet.compression", "SNAPPY")
.config("parquet.enable.dictionary", "false")
.config("spark.sql.parquet.compression.codec", "snappy")
.config("hive.mapred.mode", "nonstrict")
.config("spark.sql.hive.hiveserver2.jdbc.url","url")
.enableHiveSupport()
.getOrCreate()
val hive = com.hortonworks.spark.sql.hive.llap.HiveWarehouseBuilder.session(sparksession).build()
hive.setDatabase("testdb")
val d1="select id,name,salary,city from testdb.test";
val d2=hive.executeQuery(d1)
d2.show //showing hive table's data.
d2.write.format("orc").mode("append").saveAsTable("testdb.test_2")
//test_2 table is already created in testdb database.
//but here getting error 'testdb' not found
d2.write.format("orc").mode("append").saveAsTable("default.test_2")
//if am using default then its not giving any error but saving data in
spark metadata not as a hive table.
//same code is working fine in cloudera and am am getting data
in hive table but in hdp it's giving error.
我也尝试过save()方法,但在使用save方法时,我不能使用bucket by。有谁能建议我如何将这个Dataframe直接保存到hdp3.0中的配置单元表中吗
1条答案
按热度按时间ix0qys7i1#
如果将配置单元连接的默认数据库设置为
testdb
通过hive.setDatabase("testdb")
然后您应该直接访问其中的所有表,而不需要数据库。test_2
而不是testdb.test_2