Json File :
{
"buyer": {
"legalBusinessName": "test1 Company","organisationIdentifications": [{ "type": "abcd",
"identification": "test.bb@tesr"
},
{
"type": "TXID","identification": "12345678"
}
]
},
"supplier": {
"legalBusinessName": "test Company",
"organisationIdentifications": [
{
"type":"abcd","identification": "test28@test"
}
]
},
"paymentRecommendationId": "1234-5678-9876-2212-123456",
"excludedRemittanceInformation": [],
"recommendedPaymentInstructions": [{
"executionDate": "2022-06-12",
"paymentMethod": "aaaa",
"remittanceInformation": {
"structured": [{
"referredDocumentInformation": [{
"type": "xxx",
"number": "12341234",
"relatedDate": "2022-06-12",
"paymentDueDate": "2022-06-12",
"referredDocumentAmount": {
"remittedAmount": 2600.5,
"duePayableAmount": 3000
}
}]
}]
}
}]
}
创建表语句:
CREATE EXTERNAL TABLE IF NOT EXISTS `test`.`test_rahul`
(`buyer` STRUCT< `legalBusinessName`:STRING, `organisationIdentifications`:STRUCT< `type`:STRING, `identification`:STRING>>,
`supplier` STRUCT< `legalBusinessName`:STRING, `organisationIdentifications`:STRUCT< `type`:STRING, `identification`:STRING>>,
`paymentRecommendationId` STRING, `recommendedPaymentInstructions` ARRAY< STRUCT< `executionDate`:STRING, `paymentMethod`:STRING,
`remittanceInformation`:STRUCT< `structured`:STRUCT< `referredDocumentInformation`:STRUCT< `type`:STRING,
`number`:STRING, `relatedDate`:STRING, `paymentDueDate`:STRING, `referredDocumentAmount`:STRUCT< `remittedAmount`:DOUBLE,
`duePayableAmount`:INT>>>>>>)
ROW FORMAT SERDE 'org.openx.data.jsonserde.JsonSerDe'
WITH SERDEPROPERTIES ( "field.delim"=",","mapping.ts" = "number")
STORED AS textFILE LOCATION '/user/hdfs/Jsontest/';
如果我在单行中为每个记录写入Jsonfile数据,那么它工作得很好,但是如果它在多行中,那么会出现下面的错误。
错误消息:
错误:java.io。IO异常:org.apache.hadoop.hive.serde2.SerDeException:行不是有效的JSON对象-JSON异常:JSONObject文本在2 [第1行第3个字符]处必须以'}'结尾(状态=,代码=0)
可以有人善意地建议。看起来我需要添加行/字段分隔符,但不能决定我应该添加什么,以便它可以处理多行也一样的Spark。即。oprtion(多行,真)
1条答案
按热度按时间qltillow1#
看起来Hive中的JSON serde不支持多行。您可能需要将JSON扁平化为单行,如下所示。