下面是我的输入xml
<entity>
<link idType="ProviderId">AEY000977645</link>
<link idType="PAID">000977645</link>
<link idType="PID">AEY</link>
<message reason="Not Currently In TMS Database" status="Unmappable"/>
</entity>
我需要解析这些数据并使用hivexmlserde创建一个hive表,其中包含4列(providerid、paid、pid、message\u reason)。因为所有的值都在标记中,所以很难解析数据。下面是我的表ddl。
CREATE EXTERNAL TABLE xml_testing
(
provider_id String,
paid String,
pid String,
message_reason String
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.provider_id"="/entity/link/@idType", --> not sure what to give here
"column.xpath.paid"="/entity/link/@idType", --> not sure what to give here
"column.xpath.pid"="/entity/link/@idType", --> not sure what to give here
"column.xpath.message_reason"="/entity/message/@reason"
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/input/'
TBLPROPERTIES (
"xmlinput.start"="<entity>",
"xmlinput.end"="</entity>"
);
有人能帮我吗?
1条答案
按热度按时间ijnw1ujt1#
按属性值选择元素。
ie/entity/link[@idtype='providerid']/text()