将带有数组/结构和灵活模式的json文件加载到配置单元表中

siv3szwd  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(269)

需要帮助将json文件加载到表中。以下是文件中一些json对象的示例:

{"asin": "0002000202", "title": "Black Berry, Sweet Juice: On Being Black and White in Canada", "price": 13.88, "imUrl": "http://ecx.images-amazon.com/images/I/51PQAYJ9EDL.jpg", "related": {"also_bought": ["0393333094"], "buy_after_viewing": ["0393333094", "1554685087"]}, "salesRank": {"Books": 3013713}, "categories": [["Books"]]}
{"asin": "0000041696", "title": "Arithmetic 2 A Beka Abeka 1994 Student Book (Traditional Arithmentic Series)", "price": 6.53, "imUrl": "http://ecx.images-amazon.com/images/I/41cGaan-BrL._SL500_.jpg", "related": {"also_viewed": ["B000KOYDUY", "B004GE1B7W", "B008SXBO88", "B001EH7Y02", "B000W7PN62", "B004H3G1X6", "B004WOEIXA", "B000AXWEEM", "0789478722", "B000MN2C56", "1402709269", "B001HHOKG0", "B000Y9TO1S", "1402711441", "0756609356", "0142400106", "1556616465", "0545021383", "B004LDD18A", "B000HZH18C", "1557996563", "B00CZTVUKI", "B001CXK8Y2", "B000QX6KY6"], "buy_after_viewing": ["B000KOYDUY", "B004GE1B7W", "B000LBXGRC", "0439827655"]}, "salesRank": {"Books": 2554321}, "categories": [["Books"]]}

如您所见,模式因对象而异。并非所有对象中都存在某些属性。还有结构和数组。
这是我的create table语句

create table amazon.products_test
(asin string,
title string,
description string,
brand string,
price float,
salesRank struct<category:string, rank:int> ,
imUrl string,
categories array<string>,
related struct<also_bought:string, also_viewed:string, buy_after_viewing:string, bought_together:string>)
ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe';

我的加载语句:

load data inpath '/user/amazon/products_test.json'
overwrite into table amazon.products_test;

在这里我试着询问

hive> select * FROM products_test;
OK
Failed with exception java.io.IOException:org.apache.hadoop.hive.serde2.SerDeException: java.io.IOException: Field name expected

我有正确的数据类型吗?有更好的serde吗?我需要添加tblproperty还是serdeproperty?

7ivaypg9

7ivaypg91#

我找到了答案。正如我所怀疑的,我需要使用不同的serde:

ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'

我看到一些论坛建议我可能需要使用这个serde,但我不知道如何实现和添加jar:
https://github.com/rcongiu/hive-json-serde
另外,我需要为salesrank使用map类型而不是struct

相关问题