循环遍历一组单词，然后使用regex从文本中删除这些单词[duplicate]

eblbsuwk 于 2023-03-13 发布在其他

关注(0)|答案(3)|浏览(186)

此问题在此处已有答案：

（9个答案）
20天前关闭。
我有一组单词（这组单词是动态的，所以我必须使用for循环）

a = {'i', 'the', 'at', 'it'}

我收到一条短信

text = 'i want to jump the rope. i will do it tomorrow at 5pm. i love to jump the rope.'

现在我试图从文本中删除这个词，但不知何故，它不工作。以下是我使用：

for word in a:
    text = re.sub(r'\bword\b', '', text).strip()

regex

来源：https://stackoverflow.com/questions/75515943/loop-through-a-set-of-words-and-then-use-regex-to-remove-the-words-from-text

3条答案

按热度按时间

8aqjt8rx1#

你的正则表达式正在寻找字符串"word"，你应该使用f-string来使用存储在变量word中的值：

text = re.sub(rf'\b{word}\b', '', text).strip()

赞(0）回复(0）举报 2023-03-13

o2gm4chl2#

这不起作用的原因是您要查找文本字符串“word”。

text=re.sub(rf'\b{word}\b', '', text).strip()

这会将word的实际值添加到字符串中。
在调试正则表达式时，记录匹配有助于检查它是否按预期工作。

import re;

a={'i', 'the', 'at', 'it'}
text='i want to jump the rope. i will do it tomorrow at 5pm. i love to jump the rope.'

for word in a:
    print(f'Updating text, removing "{word}" from: "{text}"')
    # text=re.sub(r'\bword\b', '', text).strip()
    print(re.search(r'\bword\b', text))

更新文本，从以下内容中删除“at”：“我想跳绳。我明天下午5点就去。我喜欢跳绳。”
无
你可以看到这不是在寻找匹配，但是如果我们简化你的表达式：

print(re.search(word, text))

更新文本，从以下内容中删除“it”：“我想跳绳。我明天下午5点就去。我喜欢跳绳。”
〈re.匹配对象;span=（35，37），匹配
这确实找到了一个匹配项，这表明在转换为正则表达式时出现了问题。
regex101对于诊断这类问题非常有用，只需打印出实际的正则表达式，并针对输入进行测试：

print(r'\bword\b')
print(rf'\b{word}\b')

\b字\b
\b该\b
您可能还想整理空白，可以这样做：

text=re.sub(rf'\b{word}\s?\b', '', text).strip()

想跳绳。明天下午5点行。喜欢跳绳。

赞(0）回复(0）举报 2023-03-13

7gcisfzg3#

为什么要导入一个库而不直接使用replace()呢？

list_words = {'i', 'the', 'at', 'it'}
text = 'i want to jump the rope. i will do it tomorrow at 5pm. i love to jump the rope.'

for word in list_words:
    text = text.replace(word, "")

编辑
正如Seluck在下面的评论中指出的那样，这有一个缺陷。

赞(0）回复(0）举报 2023-03-13

我来回答

循环遍历一组单词，然后使用regex从文本中删除这些单词[duplicate]

3条答案

相关问题

热门标签

最新问答