如何使用regex对非常特定的模式进行分组？

erhoui1w 于 2023-03-09 发布在其他

关注(0)|答案(5)|浏览(154)

问题：
https://coderbyte.com/editor/Simple%20Symbols
str参数将由+和=符号组成，中间有几个字母（例如++ d +==+ c++== a），要使字符串为true，每个字母都必须用+符号括起来。因此左侧的字符串将为false。字符串将不为空，并且至少有一个字母。
输入："+ d += 3 =+ s +"
输出："真"
输入："f ++ d +"
输出："假"
我试图为下面的问题创建一个正则表达式，但是我不断遇到各种各样的问题。我怎样才能产生一个返回指定规则（'+\D +'）的东西呢？

import re
plusReg = re.compile(r'[(+A-Za-z+)]')
plusReg.findall()
>>> []

在这里，我想我可以创建自己的类来搜索模式。

import re
plusReg = re.compile(r'([\\+,\D,\\+])')
plusReg.findall('adf+a+=4=+S+')
>>> ['a', 'd', 'f', '+', 'a', '+', '=', '=', '+', 'S', '+']

这里我想我的'\+'将挑出加号，并读它作为一个字符。

mo = plusReg.search('adf+a+=4=+S+')
mo.group()
>>>'a'

在这里使用相同的shell，我尝试使用search而不是findall，但我最终得到的是第一个字母，它甚至没有被加号包围。
我的最终结果是将字符串'adf + a += 4 =+ S +'分组为['+ a +'，'+ S +']，依此类推。

regex

来源：https://stackoverflow.com/questions/41369550/how-can-you-group-a-very-specfic-pattern-with-regex

5条答案

按热度按时间

0sgqnhkj1#

一种方法是在字符串中搜索以下任一字母：（1）not 前接+，或（2）not 后接+。这可以使用前瞻和后视Assert来完成：

>>> rgx = re.compile(r'(?<!\+)[a-zA-Z]|[a-zA-Z](?!\+)')

因此，如果rgx.search(string)返回None，则字符串是 valid：

>>> rgx.search('+a+') is None
True
>>> rgx.search('+a+b+') is None
True

但是如果它返回匹配，则字符串是 invalid：

>>> rgx.search('+ab+') is None
False
>>> rgx.search('+a=b+') is None
False
>>> rgx.search('a') is None
False
>>> rgx.search('+a') is None
False
>>> rgx.search('a+') is None
False

前瞻/后视Assert的重要之处在于它们不消耗字符，因此可以处理重叠匹配。

赞(0）回复(0）举报 2023-03-09

new9mtju2#

类似下面这样的东西应该可以做到：

import re

def is_valid_str(s):
  return re.findall('[a-zA-Z]', s) == re.findall('\+([a-zA-Z])\+', s)

用法：

In [10]: is_valid_str("f++d+")
Out[10]: False

In [11]: is_valid_str("+d+=3=+s+")
Out[11]: True

赞(0）回复(0）举报 2023-03-09

pobjuy323#

我认为你的思路是正确的。你的正则表达式是正确的，但它可以简化到只有字母：

search_pattern = re.compile(r'\+[a-zA-z]\+')

现在我们可以把这个正则表达式和findall函数一起使用：

results = re.findall(search_pattern, 'adf+a+=4=+S+')  # returns ['+a+', '+S+']

现在这个问题需要你返回一个布尔值，这取决于字符串对于指定的模式是否有效，这样我们就可以把这一切都 Package 成一个函数：

def is_valid_pattern(pattern_string):
    search_pattern =  re.compile(r'\+[a-zA-z]?\+')
    letter_pattern = re.compile(r'[a-zA-z]')  # to search for all letters
    results = re.findall(search_pattern, pattern_string)
    letters = re.findall(letter_pattern, pattern_string)
    # if the lenght of the list of all the letters equals the length of all
    # the values found with the pattern, we can say that it is a valid string
    return len(results) == len(letter_pattern)

赞(0）回复(0）举报 2023-03-09

agxfikkp4#

您应该寻找不存在的，而不是是。您应该搜索类似([^\+][A-Za-z]|[A-Za-z][^\+])的内容。中间的|是一个逻辑or运算符。然后，在两侧，它检查是否可以找到 any 场景，其中左侧/右侧分别有一个字母没有“+”。如果找到内容，这意味着字符串失败了，如果它找不到任何东西，这意味着没有一个字母不是被“+"括起来的.

赞(0）回复(0）举报 2023-03-09

zbdgwd5y5#

import re
def SimpleSymbols(str): 
    #added padding, because if str = 'y+4==+r+'
    #then program would return true when it should return false. 
    string = '=' + str + '=' 
    #regex that returns false if a letter *doesn't* have a + in front or back
    plusReg = re.compile(r'[^\+][A-Za-z].|.[A-Za-z][^\+]')
    #if statement that returns "true" if regex doesn't find any letters
    #without a + behind or in front
    if plusReg.search(string) is None:
        return "true"
    return "false"

print SimpleSymbols(raw_input())

我从ekhumoro和桑杰那里借了些代码。
这个答案是由CC BY-SA 3.0下的OP Allen Birmingham发布的，作为问题edit。

赞(0）回复(0）举报 2023-03-09

我来回答

如何使用regex对非常特定的模式进行分组？

5条答案

相关问题

热门标签

最新问答