在apachespark中，用java将Dataframe写入hive表

gblwokeq 于 2021-06-26 发布在 Hive

关注(0)|答案(1)|浏览(566)

我正在尝试完成一个简单的“将Dataframe写入hive表”的任务，下面是用java编写的代码。我使用的是cloudera虚拟机，没有任何变化。

public static void main(String[] args) {
    String master = "local[*]";

    SparkSession sparkSession = SparkSession
            .builder().appName(JsonToHive.class.getName())
            //.config("spark.sql.warehouse.dir", "hdfs://localhost:50070/user/hive/warehouse/")
            .enableHiveSupport().master(master).getOrCreate();

    SparkContext context = sparkSession.sparkContext();
    context.setLogLevel("ERROR");

    SQLContext sqlCtx = sparkSession.sqlContext();
    Dataset<Row> rowDataset = sqlCtx.jsonFile("employees.json");
    rowDataset.printSchema();
    rowDataset.registerTempTable("employeesData");

    Dataset<Row> firstRow = sqlCtx.sql("select employee.firstName, employee.addresses from employeesData");
    firstRow.show();

    sparkSession.catalog().listTables().select("*").show();

    firstRow.write().mode() saveAsTable("default.employee");
    sparkSession.close();

}

我已经使用hql在配置单元中创建了托管表，

CREATE TABLE employee ( firstName STRING, lastName STRING, addresses  ARRAY < STRUCT < street:STRING,  city:STRING, state:STRING > > )  STORED AS PARQUET;

我正在读取一个简单的json文件，其中包含“employees.json”中的数据

{"employee":{"firstName":"Neil","lastName":"Irani","addresses":[{"street":"36th","city":"NYC","state":"Ny"},{"street":"37th","city":"NYC","state":"Ny"},{"street":"38th","city":"NYC","state":"Ny"}]}}

上面写着“table” default . employee 已存在。；“它不附加内容。如何将内容附加到配置单元表？？
如果我设置模式（“append”），它不会抱怨，但也不会写内容。。
firstrow.write（）.mode（“append”）saveastable（“default.employee”）；
任何帮助都将不胜感激。。。谢谢。

+-------------+--------+-----------+---------+-----------+
|         name|database|description|tableType|isTemporary|
+-------------+--------+-----------+---------+-----------+
|     employee| default|       null|  MANAGED|      false|
|employeesdata|    null|       null|TEMPORARY|       true|
+-------------+--------+-----------+---------+-----------+

更新
/usr/lib/hive/conf/hive-site.xml不在类路径中，因此它没有读取表，在将它添加到类路径中之后，它工作正常。。。因为我是从intellij跑过来的，所以我有这个问题。。在生产中，spark conf文件夹将链接到hive-site.xml。。。

Hive apache-spark apache-spark-sql apache-spark-dataset spark-dataframe

来源：https://stackoverflow.com/questions/45288012/in-apache-spark-writing-a-dataframe-to-hive-table-in-java

1条答案

按热度按时间

mrfwxfqh1#

看起来您应该执行insertinto（string tablename）而不是 saveAsTable(String tableName) .

firstRow.write().mode("append").insertInto("default.employee");

赞(0）回复(0）举报 2021-06-26

我来回答

在apachespark中，用java将Dataframe写入hive表

1条答案

相关问题

热门标签

最新问答