如何在regex中匹配捕获组之前或之后的字符?

ubof19bj  于 2023-03-04  发布在  其他
关注(0)|答案(3)|浏览(227)

我有一个带有regex模式的Python脚本,如果单词employee_id的前面或后面有等号,则搜索该单词。

import re

pattern = r"(=employee_id|employee_id=)"

print(re.search(pattern, "=employee_id").group(1))  # =employee_id
print(re.search(pattern, "employee_id=").group(1))  # employee_id=
print(re.search(pattern, "=employee_id=").group(1))  # =employee_id
print(re.search(pattern, "employee_id"))  # None
print(re.search(pattern, "employee_identity="))  # None

如何修改正则表达式模式,使其只捕获字符串中不带等号的employee_id部分?

# Desired results
print(re.search(pattern, "=employee_id").group(1))  # employee_id
print(re.search(pattern, "employee_id=").group(1))  # employee_id
print(re.search(pattern, "=employee_id=").group(1))  # employee_id
print(re.search(pattern, "employee_id"))  # None
print(re.search(pattern, "employee_identity="))  # None

我尝试使用捕获组,但是在employee_id周围加上括号意味着我的结果在两个捕获组之间拆分:

pattern = r"=(employee_id)|(employee_id)="
print(re.search(pattern, "employee_id=").group(1))  # None
print(re.search(pattern, "employee_id=").group(2))  # employee_id

使用可选组将匹配没有任何等号的employee_id

(?:=)?(employee_id)(?:=)?

我也不想exclude matches where the character is both before and after the word

lawou6xi

lawou6xi1#

试试看:

(?<==)employee_id|employee_id(?==)

Regex demo.
或者,如果您希望它在捕获组内匹配

((?<==)employee_id|employee_id(?==))

Regex demo.
如果在字符串之前或之后存在=,则匹配employee_id
编辑:Python示例:

import re

pattern = r"(?<==)employee_id|employee_id(?==)"

print(re.search(pattern, "=employee_id").group(0))  # =employee_id
print(re.search(pattern, "employee_id=").group(0))  # employee_id=
print(re.search(pattern, "=employee_id=").group(0))  # =employee_id

图纸:

employee_id
employee_id
employee_id

或者:您可以在模式周围添加捕获组:
您可以在模式周围放置捕获组:

import re

pattern = r"((?<==)employee_id|employee_id(?==))"

print(re.search(pattern, "=employee_id").group(1))  # =employee_id
print(re.search(pattern, "employee_id=").group(1))  # employee_id=
print(re.search(pattern, "=employee_id=").group(1))  # =employee_id

图纸:

employee_id
employee_id
employee_id
yws3nbqq

yws3nbqq2#

如果只希望有一个捕获组,同时确保=在捕获组之前或之后,请用途:

(?:(?<==)|(?=\w+=))(employee_id)\b

RegEx Demo

RegEx详细信息:

  • (?::非捕获组启动
  • (?<==):Assert在当前位置之前有=
  • |:或
  • (?=\w+=):Assert在当前位置之后有1个以上单词字符和=
  • ):非捕获组结束
  • (employee_id):匹配并捕获employee_id
  • \b:字边界
dly7yett

dly7yett3#

这可能比实际需要的更复杂,但这是一种选择。
Python的re支持命名组,所以我们希望这能起作用:

=(?P<employee_id>\d+)(?!=)|(?<!=)(?P<employee_id>\d+)=

不幸的是,事实并非如此,尽管两个群体永远不会发生冲突。

error: redefinition of group name 'employee_id' as group 2; was group 1 at position 37

但是,这适用于第三方regex包:

>>> import regex
>>> pattern = regex.compile(r"=(?P<employee_id>\d+)(?!=)|(?<!=)(?P<employee_id>\d+)=")
>>> match = pattern.search("And he's like: id=42. So hilarious!")
>>> match
<regex.Match object; span=(17, 20), match='=42'>
>>> match.groupdict()
{'employee_id': '42'}

如果想使用re,可以使用一个helper函数,并对该模式做一些细微的修改:

def unify_groupdict(groupdict):
    result = {}
    for name, match in groupdict.items():
        name = name.rstrip("_")
        if result.get(name) is None:
            result[name] = match
    return result

###

pattern = re.compile(r"=(?P<employee_id>\d+)(?!=)|(?<!=)(?P<employee_id_>\d+)=")
match = pattern.search("And he's like: id=42. So hilarious!")

print(unify_groupdict(match.groupdict()))
# {'employee_id': 42}

相关问题