基于sparkscala的databricks嵌套xml文件解析

biswetbf 于 2021-07-14 发布在 Spark

关注(0)|答案(0)|浏览(249)

我正在尝试使用spark/scala解析一个包含多个行标记的嵌套xml文件。解析之后，我必须将数据加载到一个表中。但是，我无法将多个行标记转换为适当的表格格式。我正在azuredatabricks集群中使用sparkxml库。有人能帮忙吗。
下面是文件的示例源和模式。原始文件的大小约为20mb

<?xml version="1.0" encoding="UTF-8"?>
<on xmlns:xsi="http://xxxxxxxx.yyyyyyy" xsi:xxxxxLocation="on_sources_2.0.xsd" schv="2.0">
<header>  `    `  
<cnt>On(r) TV: Sources</cnt>    
<ctd>2021-04-11</ctd>
<cgr>cgr 2021 Gracenote. All rights reserved.</cgr>
<st>2021-04-11T00:00:00</st>
<pd>xx</pd>
</header>
<sources>
<prgSvcs>
<prgSvc sid="00000" pid="0000">
<nm>FXX ind</nm>
<address>
<ct>mum</ct>
<state>mh</state>
<pcd>111x2</pcd>
<cty>ind</cty>
</address>
<type>Satellite</type>
<rshps>
<rshp type="HD Version of">0000</rshp>
</rshps>
<attrbs>
<attrb>test</attrb>
<attrb>test2</attrb>
</attrbs>
<tmzn>IST Observing</tmzn>
<clsgn>XXC</clsgn>
<edlags>`test`
<edlag>en</edlag>
</edlags>
<bcaslags>
<bcaslag>en</bcaslag>
</bcaslags>
<URL>www.xxxyyyy.com/</URL>
<images>
<image type="image/png" wdt="00" hgt="22" prmy="true" ctrgy="Logo">
<URI>i0/xxxxx/00000/s00000_h4_ba.png</URI>
</image>
<image type="image/png" wdt="180" hgt="000" prmy="true" ctrgy="Logo">
<URI>h5/xxxxx/00000/s00000_h5_aa.png</URI>
</image>
<image type="image/png" wdt="360" hgt="000" prmy="true" ctrgy="Logo">
<URI>h3/xxxxx/00000/s00000_h3_aa.png</URI>
</image>
<image type="image/png" wdt="90" hgt="00" prmy="true" ctrgy="Logo">
<URI>h4/xxxxx/00000/s00000_h4_aa.png</URI>
</image>
<image type="image/png" wdt="360" hgt="003" prmy="true" ctrgy="Logo">
<URI>h3/xxxxx/00000/s00000_h3_ba.png</URI>
</image>
<image type="image/png" wdt="180" hgt="002" prmy="true" ctrgy="Logo">
<URI>h5/xxxxx/00000/s00000_h5_ba.png</URI>
</image>
</images>
</prgSvc>
</prgSvcs>
</sources>
</on>

SCHEMA:

schv    
cnt          
ctd          
cgr        
st            
pd           
sid         
pid         
nm             
ct             
state            
pcd       
cty          
pty      
rshp     
rshp_type
attrb           
tmzn         
clsgn         
edlag           
bcaslag        
num              
mjrnum         
mirnum         
affil            
afffil_pid  
url              
mktid         
mktid_type    
imgtyp       
wdt            
hgt           
prmy          
ctrgy         
uri              
ctdtline

scala apache-spark databricks xml azure-databricks

来源：https://stackoverflow.com/questions/67122585/nested-xml-file-parsing-in-databricks-using-spark-scala

暂无答案！

目前还没有任何答案，快来回答吧！

我来回答

基于sparkscala的databricks嵌套xml文件解析

暂无答案！

相关问题

热门标签

最新问答