pandas 对文本文件的每一行执行多个正则表达式操作，并将提取的数据存储在相应的列中

yqkkidmi 于 2023-02-11 发布在其他

关注(0)|答案(1)|浏览(98)

- 测试. txt中的数据**

<ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>DXB</CityCode><CountryCode>EG</CountryCode><Currency>USD</Currency><Channel>TA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>95HAJSTI</Value></Param></CustomParams></Pricing></ServiceRQ>

<SearchRQ xmlns:xsi="http://"><SaleInfo><CityCode>CPT</CityCode><CountryCode>US</CountryCode><Currency>USD</Currency><Channel>AY</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>56ASJSTS</Value></Param></CustomParams></Pricing></SearchRQ>

<ServiceRQ xmlns:xsi="http://"><SaleInfo><CityCode>BOM</CityCode><CountryCode>AU</CountryCode><Currency>USD</Currency><Channel>QA</Channel></SaleInfo><Pricing><CustomParams><Param Name="AG"><Value>85ATAKSQ</Value></Param></CustomParams></Pricing></ServiceRQ>

<ServiceRQ ......

<SearchRQ ........

- 我的密码：**

import pandas as pd
import re
columns = ['Request Type','Channel','AG']
# data = pd.DataFrame
exp = re.compile(r'<(.*)\s+xmlns'
                 r'<Channel>(.*)</Channel>'
                 r'<Param Name="AG">.*?<Value>(.*?)</Value>')
final = []
with open(r"test.txt") as f:
    for line in f:
        result = re.search(exp,line)
        final.append(result)

    df = pd.DataFrame(final, columns)
    print(df)

- 我的预期输出是**我想遍历每一行，执行3 regex运算，并从文本文件的每一行提取数据

1. r'<(.*)\s+xmlns'
2. r'<Channel>(.*)</Channel>'
3. r'<Param Name="AG">.*?<Value>(.*?)</Value>')

每个正则表达式从单行中提取各自的数据
1.提取请求的类型
1.提取频道名称
1.提取AG的当前值

- 我的预期输出ExcelSheet**

Request Type    Channel       AG
ServiceRQ         TA        95HAJSTI  
SearchRQ          AY        56ASJSTS
ServiceRQ         QA        85ATAKSQ
 ...              ...         .....
 ...              ....        .....
and so on..

我怎样才能达到预期的产出。

pandas

来源：https://stackoverflow.com/questions/75385051/perform-multiple-regex-operations-on-each-line-of-text-file-and-store-extracted

1条答案

按热度按时间

pxq42qpu1#

试试这个re，实际上我不知道你的文本内容看起来怎么样，但这将与我所看到的工作。
result.groups()将提取所有组的匹配元素，然后在附加之前返回元组。

exp = re.compile(r'<(\w+)\s+xmlns.*?>.*?'
                 r'<Channel>(.*?)</Channel>.*?'
                 r'<Param Name="AG"><Value>(.*?)</Value>')
final = []
with open(r"test.txt") as f:
    for line in f:
        result = re.search(exp,line)
        if result:
            final.append(result.groups())
            
df = pd.DataFrame(final, columns=columns)
print(df)

- 测试代码：**

x一个一个一个一个x一个一个二个x

赞(0）回复(0）举报 2023-02-11

我来回答

pandas 对文本文件的每一行执行多个正则表达式操作，并将提取的数据存储在相应的列中

1条答案

相关问题

热门标签

最新问答