如何处理Hive表中的多行记录

qgelzfjb  于 2022-11-05  发布在  Hive
关注(0)|答案(1)|浏览(162)
Json File :

{ 
"buyer": { 
"legalBusinessName": "test1 Company","organisationIdentifications": [{ "type": "abcd",
"identification": "test.bb@tesr"
}, 
{ 
"type": "TXID","identification": "12345678"
}
]
},
"supplier": {
"legalBusinessName": "test Company",
"organisationIdentifications": [
{
"type":"abcd","identification": "test28@test"
}
]
},
"paymentRecommendationId": "1234-5678-9876-2212-123456",
"excludedRemittanceInformation": [],
"recommendedPaymentInstructions": [{
"executionDate": "2022-06-12",
"paymentMethod": "aaaa",
"remittanceInformation": {
"structured": [{
"referredDocumentInformation": [{
"type": "xxx",
"number": "12341234",
"relatedDate": "2022-06-12",
"paymentDueDate": "2022-06-12",
"referredDocumentAmount": {
"remittedAmount": 2600.5,
"duePayableAmount": 3000
}
}]
}]
}
}]
}

创建表语句:

CREATE EXTERNAL TABLE IF NOT EXISTS `test`.`test_rahul` 
(`buyer` STRUCT< `legalBusinessName`:STRING, `organisationIdentifications`:STRUCT<  `type`:STRING, `identification`:STRING>>, 
`supplier` STRUCT< `legalBusinessName`:STRING, `organisationIdentifications`:STRUCT<    `type`:STRING, `identification`:STRING>>,
`paymentRecommendationId` STRING, `recommendedPaymentInstructions` ARRAY< STRUCT<     `executionDate`:STRING, `paymentMethod`:STRING, 
`remittanceInformation`:STRUCT< `structured`:STRUCT<     `referredDocumentInformation`:STRUCT< `type`:STRING, 
`number`:STRING, `relatedDate`:STRING, `paymentDueDate`:STRING,     `referredDocumentAmount`:STRUCT< `remittedAmount`:DOUBLE, 
`duePayableAmount`:INT>>>>>>) 
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'  
WITH SERDEPROPERTIES ( "field.delim"=",","mapping.ts" = "number") 
STORED AS textFILE LOCATION '/user/hdfs/Jsontest/';

如果我在单行中为每个记录写入Jsonfile数据,那么它工作得很好,但是如果它在多行中,那么会出现下面的错误。
错误消息:
错误:java.io。IO异常:org.apache.hadoop.hive.serde2.SerDeException:行不是有效的JSON对象-JSON异常:JSONObject文本在2 [第1行第3个字符]处必须以'}'结尾(状态=,代码=0)
可以有人善意地建议。看起来我需要添加行/字段分隔符,但不能决定我应该添加什么,以便它可以处理多行也一样的Spark。即。oprtion(多行,真)

qltillow

qltillow1#

看起来Hive中的JSON serde不支持多行。您可能需要将JSON扁平化为单行,如下所示。

{ "buyer": { "a": "1", "b": "2" }, "c": "3" }
{ "buyer": { "a": "1", "b": "2" }, "c": "3" }
{ "buyer": { "a": "1", "b": "2" }, "c": "3" }
...

相关问题