python-3.x 如何将一个标签中包含多个属性值的XML解析为DataFrame?

xmakbtuz  于 2023-01-27  发布在  Python
关注(0)|答案(3)|浏览(141)
<?xml version="2.0" encoding="UTF-8" ?><timestamp="20220113">
<defintions>
    <defintion id="1" old_id="0">Lang</defintion>
    <defintion id="7" old_id="1">Eng</defintion>

如何解析这样的XML文件?在这里,我在一个标签中有多个值。我想提取listdataframe的值,如“ID”和“OLD_ID”。

68de4m5k

68de4m5k1#

您可以使用BeautifulSoupxml解析器来实现您的目标,只需选择所需的元素并通过.get()迭代ResultSet以提取属性值。

with open('filename.xml', 'r') as f:
    file = f.read() 
    soup = BeautifulSoup(file, 'xml')
示例
from bs4 import BeautifulSoup
import pandas as pd

xml = '''<?xml version="2.0" encoding="UTF-8" ?><timestamp="20220113">
<defintions>
    <defintion id="1" old_id="0">Lang</defintion>
    <defintion id="7" old_id="1">Eng</defintion>
'''
soup = BeautifulSoup(xml,'xml')

pd.DataFrame(
    [
        (e.get('id'),e.get('old_id'))
        for e in soup.select('defintion')
    ],
    columns = ['id','old_id']
)
输出

| | 身份证|旧标识|
| - ------|- ------|- ------|
| 无|1个|无|
| 1个|七|1个|

vc9ivgsu

vc9ivgsu2#

使用pythonBeautifulSoup,可以将.xml文件解析为BeatufulSoup对象,然后使用.findAll('definitions '),然后循环遍历找到的标记并获得所需的值

object.findAll('defintions')

for defintion in defintions:
    old_id = defintions['old_id']
    id = defintions['id']

参考:https://www.crummy.com/software/BeautifulSoup/bs4/doc/https://linuxhint.com/parse_xml_python_beautifulsoup/

yebdmbv4

yebdmbv43#

如果您有有效的XMLlike(时间戳标记不能有类似属性的值):

<?xml version='1.0' encoding='utf-8'?>
<root timestamp='20220113'>
<defintions>
    <defintion id="1" old_id="0">Lang</defintion>
    <defintion id="7" old_id="1">Eng</defintion>
</defintions>
</root>

那么您可以使用pandas

import pandas as pd

df = pd.read_xml('x89.xml', xpath='.//defintion')
print(df.to_string(index=False))

输出:

id  old_id defintion
  1       0      Lang
  7       1       Eng

相关问题