在配置单元中使用xmlserde进行xml文件复制时出现的问题

dfuffjeb  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(269)

下面是我的输入xml

<entity>
  <link idType="ProviderId">AEY000977645</link>
  <link idType="PAID">000977645</link>
  <link idType="PID">AEY</link>
  <message reason="Not Currently In TMS Database" status="Unmappable"/>
 </entity>

我需要解析这些数据并使用hivexmlserde创建一个hive表,其中包含4列(providerid、paid、pid、message\u reason)。因为所有的值都在标记中,所以很难解析数据。下面是我的表ddl。

CREATE EXTERNAL TABLE xml_testing
(
         provider_id    String,
         paid           String,
         pid            String,
         message_reason String
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
WITH SERDEPROPERTIES (
"column.xpath.provider_id"="/entity/link/@idType", --> not sure what to give here  
"column.xpath.paid"="/entity/link/@idType", --> not sure what to give here  
"column.xpath.pid"="/entity/link/@idType", --> not sure what to give here
"column.xpath.message_reason"="/entity/message/@reason"  
)
STORED AS
INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
LOCATION '/input/'
TBLPROPERTIES (
"xmlinput.start"="<entity>",
"xmlinput.end"="</entity>"
);

有人能帮我吗?

ijnw1ujt

ijnw1ujt1#

按属性值选择元素。
ie/entity/link[@idtype='providerid']/text()

CREATE EXTERNAL TABLE xml_testing
        (
                 provider_id    String,
                 paid           String,
                 pid            String,
                 message_reason String
        )
        ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
        WITH SERDEPROPERTIES (
        "column.xpath.provider_id"="/entity/link[@idType='ProviderId']/text()",  
        "column.xpath.paid"="/entity/link[@idType='PAID']/text()",   
        "column.xpath.pid"="/entity/link[@idType='PID']/text()", 
        "column.xpath.message_reason"="/entity/message/@reason"  
        )
        STORED AS
        INPUTFORMAT 'com.ibm.spss.hive.serde2.xml.XmlInputFormat'
        OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.IgnoreKeyTextOutputFormat'
        LOCATION '/input/'
        TBLPROPERTIES (
        "xmlinput.start"="<entity>",
        "xmlinput.end"="</entity>"
        );

相关问题