使用python获取xml注解[重复]

lkaoscv7 于 2023-01-19 发布在 Python

关注(0)|答案(1)|浏览(121)

- 此问题在此处已有答案**：

How to read commented text from XML file in python（1个答案）
2小时前关门了。
我正在尝试从我的xml文件中获取注解，但不知道如何获取。原因是目前xml文件中没有时间数据，但它位于注解中。我想抓取18 OCT 2022 1:40:55 PM并将其转换为纪元时间戳。有人能帮助我吗？

<!-- My Movies, Generated 18 JAN 2023  1:40:55 PM  --> 
<collection shelf="New Arrivals">
<movie title="Enemy Behind">
   <type>War, Thriller</type>
   <format>DVD</format>
   <year>2003</year>
   <rating>PG</rating>
   <stars>10</stars>
   <description>Talk about a US-Japan war</description>
</movie>
<movie title="Transformers">
   <type>Anime, Science Fiction</type>
   <format>DVD</format>
   <year>1989</year>
   <rating>R</rating>
   <stars>8</stars>
   <description>A scientific fiction</description>
</movie>
</collection>

python

来源：https://stackoverflow.com/questions/75164321/get-xml-comment-using-python

1条答案

按热度按时间

0sgqnhkj1#

不幸的是，用常规的xml.etree模块读取这个XML不起作用--因为它只能从根（或任何标记"below"）开始读取--所以它跳过了第一个注解。
我建议的解决方案是定期读取文件-使用常规：

dates = []

with open('app.xml', 'r') as fp:
   lines = fp.readlines()

for line in lines:
   if line.startswith('<!--'):
      dates.append(line)

现在，为了检测日期，我建议使用正则表达式：

import re
from datetime import datetime

date_format = r'%d %b %Y  %H:%M:%S %p'
date_regex = re.compile(r'\d{2} \w{3} \d{4}  \d{1}:\d{2}:\d{2} \w{2}')

for date in dates:
    extracted_date = re.findall(pattern=date_regex, string=date)[0]
    date_formatted_to_epoch = (datetime.strptime(extracted_date, date_format) - datetime(1970, 1, 1)).total_seconds()
    print(date_formatted_to_epoch)

对我来说它输出：

1674006055.0

说明：

我使用total_seconds()的原因来自this SO post - convert-python-datetime-to-epoch中的第一个答案
正则表达式基本上是说：我们正在搜索2个数字，然后是一个空格，然后是一个3个字母的单词，然后是一个空格等等...因此，如果这些格式发生变化-检测它们可能会有问题...您可以尝试this thread as a helpful guide - extracting-date-from-a-string

赞(0）回复(0）举报 2023-01-19

我来回答

使用python获取xml注解[重复]

1条答案

相关问题

热门标签

最新问答