我的数据是这种格式的
[{"field1":"data1","field2":100,"field3":"more data1","field4":123.001}] [{"field1":"data2","field2":200,"field3":"more data2","field4":123.002}]
[{"field1":"data3","field2":300,"field3":"more data3","field4":123.003}] [{"field1":"data4","field2":400,"field3":"more data4","field4":123.004}]
(每一行都是一个只有一个对象的数组)我想围绕它创建一个配置单元表。
如果没有 []
围绕json,我可以很容易地使用默认的json serde ROW FORMAT SERDE 'org.apache.hive.hcatalog.data.JsonSerDe'
.
regex-serde的问题是字段的顺序可以改变,而且很难提取精确的值。
如何使用这种数据格式创建配置单元表?
1条答案
按热度按时间zc0qhyus1#
你应该能够使用
ARRAY<STRUCT
https://cwiki.apache.org/confluence/display/hive/languagemanual+types#languagemanualtypes-复杂类型我只建议在每行中始终有一个json对象的情况下使用regex