用Python删除字符串中相邻的重复单词？

mhd8tkvw 于 2023-03-11 发布在 Python

关注(0)|答案(4)|浏览(130)

如何删除字符串中相邻的重复单词。例如'Hey there' -〉'Hey there'

python

来源：https://stackoverflow.com/questions/68481155/remove-adjacent-duplicate-words-in-a-string-with-python

4条答案

按热度按时间

qyzbxkaa1#

使用带有反向引用的re.sub，我们可以尝试：

inp = 'Hey there There'
output = re.sub(r'(\w+) \1', r'\1', inp, flags=re.IGNORECASE)
print(output)  # Hey there

这里使用的正则表达式模式表示：

(\w+)  match and capture a word
[ ]    followed by a space
\1     then followed by the same word (ignoring case)

然后，我们用第一个相邻的单词替换.

赞(0）回复(0）举报 2023-03-11

xhv8bpkk2#

inp = 'Hey there There'
output = re.sub(r'\b(\w+) \1\b', r'\1', inp, flags=re.IGNORECASE)
print(output)  # Hey there

inp = 'Hey there eating?'
output = re.sub(r'\b(\w+) \1\b', r'\1', inp, flags=re.IGNORECASE)
print(output)  # Hey there eating?

\b确保单词边界并捕获整个单词而不是字符。第二个测试用例（“Hey there eating？”）不适用于Tim Biegleisen给出的https://stackoverflow.com/a/68481181/8439676答案。

赞(0）回复(0）举报 2023-03-11

c0vxltue3#

递归删除相邻重复单词

def removeConsecutiveDuplicateWors(s):
        st = s.split()
        if len(st) < 2:
            return " ".join(st)
        if st[0] != st[1]:
            nw =  ("".join(st[0])) +" "+ removeConsecutiveDuplicateWors(" ".join(st[1:]))
            return nw
        return removeConsecutiveDuplicateWors(" ".join(st[1:]))
      
    
    string = 'I am a duplicate duplicate word in a sentence. How I can be be be removed?'
    print(removeConsecutiveDuplicateWors(string))

输出：我是一个重复的词在一个句子中。2我如何能被删除？

赞(0）回复(0）举报 2023-03-11

hsgswve44#

应该接受Rohit Sharma's answer，因为它实际上考虑了字边界。最初的答案会错误地将Hey there eating更改为Hey thereating

或者，可以使用以下正则表达式（在某些情况下，它将产生略有不同的输出;参见下面的示例）：

my_output = re.sub(r'\b(\w+)(?:\W+\1\b)+', r'\1', my_input, flags=re.IGNORECASE)

示例1：*

输入：Buying food food in the supermarket
ROHITS版本输出：Buying food in the supermarket
以上版本输出：Buying food in the supermarket

示例2：*

输入：Food: Food and Beverages
ROHITS版本输出：Food: Food and Beverages（不变）
以上版本输出：Food and Beverages

说明：

“\B”：单词边界。在特殊情况下需要边界。例如，在“My thesis is great”中，“is”不会匹配两次。
“\w+”单词字符：[α-zA-Z_0-9]
“\W+"：非单词字符：[^\w]
“\1”：匹配第一组括号中匹配的内容，在本例中为（\w+）
“+"：匹配1次或多次后放置的任何内容

学分：

我将此代码改编为Python，但它源自this geeksforgeeks.org post

赞(0）回复(0）举报 2023-03-11

我来回答

用Python删除字符串中相邻的重复单词？

4条答案

相关问题

热门标签

最新问答