pandas 将括号内的文本替换为每行中标识的部分值

niknxzdl 于 2023-08-01 发布在其他

关注(0)|答案(4)|浏览(88)

我有一个数据框架，里面有一堆列，其中一个是这样的：

data = {'Product': ['Product A', 'Product B', 'Product C (discontinued in March 2021)', 'Product D', 'Product E (discontinued on 30 April 2004)']}

df = pd.DataFrame(data)

字符串
我试着写一段代码，遍历列的每一行，在括号中标识年份（如果适用），并将括号内的文本替换为下面的'discont. ' + the year identified。因此，对于'Product C'，它应该将其更改为Product C (discont. 2021)。

def amend_vals(value):
    pattern = r'\((\d{4})\)'  # Regex pattern to capture the year inside brackets
    match = re.search(pattern, value)
    if match:
        year = match.group(1)
        return re.sub(pattern, '(discont. ' + year + ')', value)
    else:
        return value

df['Product'] = df['Product'].apply(amend_vals)

型
但似乎不起作用。有人知道怎么修吗？

pandas

来源：https://stackoverflow.com/questions/76771712/replace-text-inside-brackets-with-partial-value-identified-in-each-respective-ro

4条答案

按热度按时间

mpgws1up1#

使用以下正则表达式替换：

df['Product'].str.replace(r'\([^(]+(\d{4})\)', rf'(discont. \1)', regex=True)

个字符

赞(0）回复(0）举报 2023-08-01

bqujaahr2#

使用正则表达式和lookarounds：

df['Product'] = df['Product'].str.replace(r'(?<=\(discont)[^)]+?(?=\d{4}\))',
                                          '. ', regex=True)

字符串
输出量：

Product
0                  Product A
1                  Product B
2  Product C (discont. 2021)
3                  Product D
4  Product E (discont. 2004)

型
regex demo

(?<=\(discont)  # match "(discont"
[^)]+?          # match anything but ")" (non-greedy)
(?=\d{4}\))     # match "DDDD)"

型

赞(0）回复(0）举报 2023-08-01

hyrbngr73#

更改regexp，以同时捕获括号内的整个片段和括号内的年份。在文本替代中使用的年份和整个作品替换为您的新文本。

def amend_vals(value):
    pattern = r'(\([^\(\)]+(\d{4})\))'  # Regex pattern to capture the year inside brackets
    match = re.search(pattern, value)
    if match:
        year = match.group(2)
        return re.sub(pattern, f'(discont. {year})', value)
    else:
        return value

df['Product'].apply(amend_vals)

字符串
输出量：

0                    Product A
1                    Product B
2    Product C (discont. 2021)
3                    Product D
4    Product E (discont. 2004)
Name: Product, dtype: object

型

赞(0）回复(0）举报 2023-08-01

bzzcjhmw4#

这两个问题是：
1.你的正则表达式不匹配任何字符串，并且
1.你没有导入你的库（可能你只是忘记包含它，但也可能没有！）

正则表达式

正则表达式是r'\((\d{4})\)。这与两边由括号包围的四位数字匹配。这意味着它将匹配(2023)，但不会匹配(discontinued 2023)或括号内的任何其他数字，其中有其他数字在数字中。
显而易见的答案是修改它，以便在数字之前（或者之后，如果你愿意）的括号内可以有任何内容。模式r'\(.*(\d{4}).*\)'可以做到这一点。

库

您要导入re和pandas。

工作代码

下面是经过上述修改的代码：

def amend_vals(value):
    pattern = r'\(.*(\d{4}).*\)'
    match = re.search(pattern, value)
    if match:
        year = match.group(1)
        return re.sub(pattern, '(discont. ' + year + ')', value)
    else:
        return value

字符串

赞(0）回复(0）举报 2023-08-01

我来回答

pandas 将括号内的文本替换为每行中标识的部分值

4条答案

正则表达式

库

工作代码

相关问题

热门标签

最新问答