如何正确设置serde xml模式?

s8vozzvw  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(327)

我有一个xml:

<AssetCrossReferences Ordered="false">
    <AssetCrossReference AssetID="F7961393-01" Type="Primary Image"/>
    <AssetCrossReference AssetID="M0504-01" Type="Vendor Logo"/>
    <AssetCrossReference AssetID="F7961393-02" Type="Colour Photograph"/>
 </AssetCrossReferences><Specification Ordered="true">

我希望最终结果如下:

AssetID:F7961393-01, Type:Primary Image
AssetID:M0504-01, Type:Vendor Logo
AssetID:F7961393-02, Type:Colour Photograph

我该怎么做?

mgdq6dx1

mgdq6dx11#

使用结构

create external table test 
(
   asset STRUCT<AssetID:STRING,Type:STRING>
)
ROW FORMAT SERDE 'com.ibm.spss.hive.serde2.xml.XmlSerDe'
with serdeproperties 
(
  "column.xpath.asset"="/AssetCrossReferences/AssetCrossReference"
)
stored as inputformat "com.ibm.spss.hive.serde2.xml.XmlInputFormat"
outputformat "org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat"
location "file:///yourfilepath" 
tblproperties 
(
  "xmlinput.start"="<AssetCrossReferences",
  "xmlinput.end"="</AssetCrossReferences>"
);

然后

select * from test;

相关问题