有没有一种方法可以使用pandas str.replace来替换一个单词，只有当它单独出现时，而不是作为一个更长的字符串的一部分？

fcy6dtqo 于 2023-09-29 发布在其他

关注(0)|答案(3)|浏览(94)

我有一个dataframe，我想只替换“Blah”，当它本身作为dataframe中的单个项目/单元格/条目出现时-而不是作为更长字符串的一部分，如“Blah guh”。请参见下面的示例：

data={"Col":["Blah","Blah gah","Blah bluh"],'Subs':["one","two","three"]}
df=pd.DataFrame(data)

所需输出：
| Col|潜艇|
| --|--|
| Blah ALL|一|
| 布拉加|两|
| Blah bluh|三|
我试着使用单词边界，但它只是取代了所有三个条目中的Blah...

df["Col"] = df["Col"].str.replace(r'\bBlah\b', "Blah ALL", regex=True)

| Col|潜艇|
| --|--|
| Blah ALL|一|
| Blah ALL GAH|两|
| Blah ALL bluh|三|
我肯定漏掉了什么明显的东西。

pandas

来源：https://stackoverflow.com/questions/77189035/is-there-a-way-using-pandas-str-replace-to-replace-a-word-only-when-it-occurs-by

3条答案

按热度按时间

oprakyz71#

不要使用单词边界（\b），而是选择字符串开始/结束锚点（^/$）：

df["Col"] = df["Col"].str.replace(r'^Blah$', "Blah ALL", regex=True)

赞(0）回复(0）举报 2023-09-29

6vl6ewon2#

这是不是就像确保单元格以“Blah”开头和结尾一样简单？因为如果是这样的话：

df["Col"] = df["Col"].str.replace(r'^Blah$', "Blah ALL", regex=True)

赞(0）回复(0）举报 2023-09-29

bt1cpqcv3#

当需要替换完整字符串时，不要使用str.replace，而是使用replace（默认为regex=False）：

df['Col'] = df['Col'].replace('Blah', 'Blah ALL')

输出量：

Col   Subs
0   Blah ALL    one
1   Blah gah    two
2  Blah bluh  three

计时

这也快得多。
在30k行上：

# replace
3.9 ms ± 450 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

# str.replace with regex=True
37.2 ms ± 3.57 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)

赞(0）回复(0）举报 2023-09-29

我来回答

有没有一种方法可以使用pandas str.replace来替换一个单词，只有当它单独出现时，而不是作为一个更长的字符串的一部分？

3条答案

计时

相关问题

热门标签

最新问答