用Python删除字符串中相邻的重复单词?

mhd8tkvw  于 2023-03-11  发布在  Python
关注(0)|答案(4)|浏览(130)

如何删除字符串中相邻的重复单词。例如'Hey there' -〉'Hey there'

qyzbxkaa

qyzbxkaa1#

使用带有反向引用的re.sub,我们可以尝试:

inp = 'Hey there There'
output = re.sub(r'(\w+) \1', r'\1', inp, flags=re.IGNORECASE)
print(output)  # Hey there

这里使用的正则表达式模式表示:

(\w+)  match and capture a word
[ ]    followed by a space
\1     then followed by the same word (ignoring case)

然后,我们用第一个相邻的单词替换.

xhv8bpkk

xhv8bpkk2#

inp = 'Hey there There'
output = re.sub(r'\b(\w+) \1\b', r'\1', inp, flags=re.IGNORECASE)
print(output)  # Hey there

inp = 'Hey there eating?'
output = re.sub(r'\b(\w+) \1\b', r'\1', inp, flags=re.IGNORECASE)
print(output)  # Hey there eating?

\b确保单词边界并捕获整个单词而不是字符。第二个测试用例(“Hey there eating?”)不适用于Tim Biegleisen给出的https://stackoverflow.com/a/68481181/8439676答案。

c0vxltue

c0vxltue3#

递归删除相邻重复单词

def removeConsecutiveDuplicateWors(s):
        st = s.split()
        if len(st) < 2:
            return " ".join(st)
        if st[0] != st[1]:
            nw =  ("".join(st[0])) +" "+ removeConsecutiveDuplicateWors(" ".join(st[1:]))
            return nw
        return removeConsecutiveDuplicateWors(" ".join(st[1:]))
      
    
    string = 'I am a duplicate duplicate word in a sentence. How I can be be be removed?'
    print(removeConsecutiveDuplicateWors(string))

输出:我是一个重复的词在一个句子中。2我如何能被删除?

hsgswve4

hsgswve44#

应该接受Rohit Sharma's answer,因为它实际上考虑了字边界。最初的答案会错误地将Hey there eating更改为Hey thereating

或者,可以使用以下正则表达式(在某些情况下,它将产生略有不同的输出;参见下面的示例):

my_output = re.sub(r'\b(\w+)(?:\W+\1\b)+', r'\1', my_input, flags=re.IGNORECASE)
  • 示例1:*

输入:Buying food food in the supermarket
ROHITS版本输出:Buying food in the supermarket
以上版本输出:Buying food in the supermarket

  • 示例2:*

输入:Food: Food and Beverages
ROHITS版本输出:Food: Food and Beverages(不变)
以上版本输出:Food and Beverages

说明:

“\B”:单词边界。在特殊情况下需要边界。例如,在“My thesis is great”中,“is”不会匹配两次。
“\w+”单词字符:[α-zA-Z_0-9]
“\W+":非单词字符:[^\w]
“\1”:匹配第一组括号中匹配的内容,在本例中为(\w+)
“+":匹配1次或多次后放置的任何内容

学分:

我将此代码改编为Python,但它源自this geeksforgeeks.org post

相关问题