python-3.x 解析XML中多个子元素的最佳方法

tpgth1q7  于 2023-02-06  发布在  Python
关注(0)|答案(2)|浏览(162)

我尝试使用xmltodict解析XML,希望最终转换为其他人更可读的表格格式。我已经能够通过大部分XML,但当我遇到一个具有多个子元素的元素时,我觉得我在追自己的尾巴。我希望使用panda和我从XML中提取的值...
下面是我试图解析的XML的一个净化版本:

<batchConfiguration>
    <batchJob name="BATCHJOB1">
      <className>batchJob1</className>
      <schedule>Y</schedule>
      <interval>300</interval>
      <systemControlled>N</systemControlled>
    </batchJob>
    <batchJob name="BATCHJOB2">
      <params>
        <param name="QueueName1">batchQueue1</param>
      </params>
      <className>batchJob2</className>
      <startTime>02:10:00</startTime>
      <schedule>N</schedule>
      <daysOfTheWeek>YYYYYYY</daysOfTheWeek>
      <systemControlled>N</systemControlled>
    </batchJob>
    <batchJob name="BATCHJOB3">
      <params>
        <param name="ignoreErrors">Y</param>
        <param name="batchSize">1000</param>
      </params>
      <className>classyBatchJob</className>
      <schedule>Y</schedule>
      <interval>90</interval>
      <systemControlled>N</systemControlled>
    </batchJob>
  </batchConfiguration>

我的想法是我可以在有多个“params”的行中循环。我可以返回单行“params”,但当有多个时就难住了。这是我到目前为止的代码。它有几个部分,我试图在我走的时候弄清楚事情。XML是从一个文件中读取的...

import xmltodict as xml
import pprint

#File to parse
fileptr=open(r"FileIRead.xml")

# Show raw XML text file data
raw_file= fileptr.read()
# print(raw_file)

# Create an XML dictionary
xml_dict=xml.parse(raw_file)
pprint.pprint(xml_dict)

xml_dict1=xml.parse(raw_file)['batchConfiguration']['batchJob']
pprint.pprint(xml_dict1)
# pprint.pprint(xml_dict['batchConfiguration']['batchJob'])

# https://docs.python.org/3/tutorial/errors.html

for bJ in xml_dict1:
    bJName=bJ['@name']
    print(f"Name: {bJ['@name']}")
    print(bJName)
    try:
        print(f"Interval: {bJ['interval']}")
    except:
        print("Interval: N/A")
    try:
        print(f"Scheduled: {bJ['schedule']}")
    except:
        print("N/A")
    try:
        print(f"Start Time: {bJ['startTime']}")
    except:
        print("Start Time: N/A")
    try:
        print(f"End Time: {bJ['endTime']}")
    except:
        print("End Time: N/A")
    try:
        # This works fine to return only a single element. With multiple it fails.
        print(f"Params: {bJ['params']['param']['@name']} - {bJ['params']['param']['#text']}")
    except:
        print("Params: N/A")
    try:
        print(f"Classname: {bJ['className']}")
    except:
        print("Classname: N/A")
    try:
        print(f"DaysOfWeek: {bJ['daysOfTheWeek']}")
    except:
        print("DaysOfWee: N/A")
    try:
        # Attempt to get all parameters single or multiple
        xml_dict2=xml.parse(raw_file)['params']['param']
        pprint.pprint(xml_dict2)
        for bJ1 in xml_dict2['params']['param']:
            print(f"--- {bJ1['@name']}")
    except:
        print("It no worky")

编辑:应请求...我已经能够得到的输出是:

Name: BATCHJOB1
Classname: batchJob1
... (etc)

我的最终目标是获取输出并将其转换为列格式,如下所示:

Name            Classname    ...
BATCHJOB1       batchJob1

“N/A”将放在该要素不存在或没有价值的地方。

eeq64g8w

eeq64g8w1#

xmltodict仅在它是一个参数时返回dict,而在它是两个或更多参数时返回列表。.parse有一个force_list参数,允许指示应始终为列表的键。
您可以使用:

xml_dict1 = xml.parse(raw_file, force_list=('param',))['batchConfiguration']['batchJob']

然后:

try:
    for p in bJ['params']['param']:
        print(f"Params: {p['@name']} - {p['#text']}")
except KeyError: # recommend never use bare 'except'
    print("Params: N/A")
lf5gs5x2

lf5gs5x22#

如果我没有理解错的话,这可以通过using pandas.read_xml()来实现:

import pandas as pd
pd.read_xml([your_xml]).iloc[:,0:2]

基于示例xml的输出:

name      className
0   BATCHJOB1   batchJob1
1   BATCHJOB2   batchJob2
2   BATCHJOB3   classyBatchJob

相关问题