RegExp匹配重复字符

qlfbtfca 于 2023-05-01 发布在其他

关注(0)|答案(7)|浏览(257)

例如，我有一个string：

aacbbbqq

因此，我希望有以下匹配：

(aa, c, bbb, qq)

我知道我可以这样写：

([a]+)|([b]+)|([c]+)|...

但我觉得我很丑，正在寻找更好的解决方案。我在寻找正则表达式解决方案，而不是自我编写的有限状态机。

regex

来源：https://stackoverflow.com/questions/6306098/regexp-match-repeated-characters

7条答案

按热度按时间

3bygqnnd1#

您可以将其匹配为：(\w)\1*

赞(0）回复(0）举报 2023-05-01

xoefb8l82#

itertools.groupby不是RexExp，但它也不是自写的。：-）引用Python文档：

# [list(g) for k, g in groupby('AAAABBBCCD')] --> AAAA BBB CC D

赞(0）回复(0）举报 2023-05-01

hiz5n14c3#

一般

诀窍是匹配所需范围的单个字符，然后确保匹配相同字符的所有重复：

>>> matcher= re.compile(r'(.)\1*')

这匹配任何单个字符（.），然后匹配它的重复（\1*）（如果有的话）。
对于输入字符串，您可以获得所需的输出：

>>> [match.group() for match in matcher.finditer('aacbbbqq')]
['aa', 'c', 'bbb', 'qq']

注意：由于匹配组的原因，re.findall将无法正常工作。

其他范围

如果你不想匹配 * 任何 * 字符，那么在正则表达式中相应地修改.：

>>> matcher= re.compile(r'([a-z])\1*') # only lower case ASCII letters
>>> matcher= re.compile(r'(?i)([a-z])\1*') # only ASCII letters
>>> matcher= re.compile(r'(\w)\1*') # ASCII letters or digits or underscores
>>> matcher= re.compile(r'(?u)(\w)\1*') # against unicode values, any letter or digit known to Unicode, or underscore

对照u'hello²²'（Python 2.x）或'hello²²'（Python 3.x）：

>>> text= u'hello=\xb2\xb2'
>>> print('\n'.join(match.group() for match in matcher.finditer(text)))
h
e
ll
o
²²

\w对非Unicode字符串/字节数组可能会被修改，如果你第一次发出locale.setlocale调用。

赞(0）回复(0）举报 2023-05-01

ecfsfe2w4#

这将起作用，请参阅此处的工作示例：http://www.rubular.com/r/ptdPuz0qDV

(\w)\1*

赞(0）回复(0）举报 2023-05-01

goucqfw65#

如果你像这样捕获反向引用，findall方法将工作：

result = [match[1] + match[0] for match in re.findall(r"(.)(\1*)", string)]

赞(0）回复(0）举报 2023-05-01

uz75evzq6#

您可以用途：

re.sub(r"(\w)\1*", r'\1', 'tessst')

输出将是：

'test'

赞(0）回复(0）举报 2023-05-01

0vvn1miw7#

你可以试试这样的方法：

import re

string = 'aacbbbqq'
result = re.findall(r'((\w)\2*?)', string)
output = [x[0] for x in result]

print(output)

输出将是：

['aa', 'c', 'bbb', 'qq']

赞(0）回复(0）举报 2023-05-01

我来回答

RegExp匹配重复字符

7条答案

一般

其他范围

相关问题

热门标签

最新问答