pandas 检查字符串中是否有任何10个字符的单词如果存在提取单词

soat7uwm  于 2022-12-09  发布在  其他
关注(0)|答案(1)|浏览(151)

我一直在寻找从字符串中提取一个10个字符的单词,如果它存在。
需要检查前5个字符是否来自给定列表,后3个字符是否为数字。

输入数据(Data.xlsx):

Description                                                  Number

CHQ -AQBCN2Q546 from India Federation Pvt Ltd               
CHQN#DJBNK0Q329 from Indiana Basics Software Ltd -BC003
CASH-NJRQC5J987 from US Fertilizers LLP
CHQ - from India Bulls Pvt Ltd
CHQ -AQBCN2Q989 from India Bulls Pvt Ltd
CHQ -AQBCN2Q546 from India Federation Pvt Ltd

list_Character - ['AQBCN','PUCNQ','DJBNK','ADJBC','NJRQC']

预期输出:

Description                                                          Number
    
CHQ -AQBCN2Q546 from India Federation Pvt Ltd                    AQBCN2Q546           
CHQN#DJBNK0Q329 from Indiana Basics Software Ltd -BC003          DJBNK0Q329
CASH-NJRQC5J987 from US Fertilizers LLP                          NJRQC5J987
CHQ - from India Bulls Pvt Ltd
CHQ -AQBCN2Q989 from India Bulls Pvt Ltd                         AQBCN2Q989
CHQ -AQCCN2Q546 from India Federation Pvt Ltd


Code:
import pandas as pd
import re

df = pd.read_excel(r'D:/Users/Data.xlsx')
list_Character - ['AQBCN','PUCNQ','DJBNK','ADJBC','NJRQC']
for i in df['Description']:
    list = re.findall("[ae]\w+", i)

我没有找到解决办法,请提出建议。

6l7fqoea

6l7fqoea1#

我想你希望:

list_Character = ['AQBCN', 'PUCNQ', 'DJBNK', 'ADJBC', 'NJRQC']
regex = r'[#-]((?:' + r'|'.join(list_Character) + r')\w{5})\b'
df["Number"] = df["Description"].str.extract(regex)

相关问题