- 测试. txt中的数据**
<ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>DXB</CityCode><CountryCode>EG</CountryCode><Currency>USD</Currency><Channel>TA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>95HAJSTI</Value></Param></CustomParams></Pricing></ServiceRQ>
<SearchRQ xmlns:xsi="http://"><SaleInfo><CityCode>CPT</CityCode><CountryCode>US</CountryCode><Currency>USD</Currency><Channel>AY</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>56ASJSTS</Value></Param></CustomParams></Pricing></SearchRQ>
<ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>BOM</CityCode><CountryCode>AU</CountryCode><Currency>USD</Currency><Channel>QA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>85ATAKSQ</Value></Param></CustomParams></Pricing></ServiceRQ>
<ServiceRQ ......
<SearchRQ ........
- 我的密码:**
import pandas as pd
import re
columns = ['Request Type','Channel','AG']
# data = pd.DataFrame
exp = re.compile(r'<(.*)\s+xmlns'
r'<Channel>(.*)</Channel>'
r'<Param Name="AG">.*?<Value>(.*?)</Value>')
final = []
with open(r"test.txt") as f:
for line in f:
result = re.search(exp,line)
final.append(result)
df = pd.DataFrame(final, columns)
print(df)
- 我的预期输出是**我想遍历每一行,执行3 regex运算,并从文本文件的每一行提取数据
1. r'<(.*)\s+xmlns'
2. r'<Channel>(.*)</Channel>'
3. r'<Param Name="AG">.*?<Value>(.*?)</Value>')
每个正则表达式从单行中提取各自的数据
1.提取请求的类型
1.提取频道名称
1.提取AG的当前值
- 我的预期输出ExcelSheet**
Request Type Channel AG
ServiceRQ TA 95HAJSTI
SearchRQ AY 56ASJSTS
ServiceRQ QA 85ATAKSQ
... ... .....
... .... .....
and so on..
我怎样才能达到预期的产出。
1条答案
按热度按时间pxq42qpu1#
试试这个
re
,实际上我不知道你的文本内容看起来怎么样,但这将与我所看到的工作。result.groups()
将提取所有组的匹配元素,然后在附加之前返回元组。x一个一个一个一个x一个一个二个x