regex 正则表达式在某些字符串中包含大括号

y3bcpkx1  于 2023-03-24  发布在  其他
关注(0)|答案(1)|浏览(193)

我有一个电子表格,其中包含一个“查找”列和一个“替换”列。注意,有些字符串是其他字符串的子集。

Find                    Replace
{Example1}              {50M00_Dewirer_South\Example1}
{Example1\Alarm}        {50M00_Dewirer_South\Example1\Alarm}
{Example1\AlarmHigh}    {50M00_Dewirer_South\Example1\AlarmHigh}
{Example1\AlarmLow}     {50M00_Dewirer_South\Example1\AlarmLow}
Example2                50M00_Dewirer_South\Example2
Example2foo             50M00_Dewirer_South\Example2foo
Example2foobar          50M00_Dewirer_South\Example2foobar
ATag                    Device_Shortcut\DirectReference
Another_Tag             Winder\Local:50:I.Data.0
Another\Tag             Winder\Local:12:O.Data.1

我需要在文件目录中搜索每个搜索词,并将搜索到的词替换为其关联的替换词。我正在搜索的文件可能还包含大小写错误。{Example 1}可能显示为{example 1},{ExAmPle 1},{exAMplE}1,或任何其它大小写字符的组合。我尝试使用正则表达式,因为我的蛮力搜索所有文件的尝试太慢了。
我已经成功地将一个正则表达式组合在一起,它可以处理不包含花括号{}的字符串。但是,如果字符串包含花括号,我的搜索函数将无法在正在搜索的文件中找到任何内容。

pattern = re.compile(
    r'\b(?:%s)\b' % '|'.join([re.escape(term) for term in replace_dict]),
    re.IGNORECASE
)

我应该如何构造正则表达式以将花括号作为搜索词的一部分?另外,如果我的搜索词不包含花括号,新的正则表达式是否仍然可以使用,或者我是否必须恢复到当前模式?
编辑:我可能应该扩大这个问题,因为我还没有发现所有可能的特殊字符,我需要搜索.可以创建一个正则表达式,可以潜在地包含任何组合的特殊字符?

f2uvfpb9

f2uvfpb91#

以下是当前版本问题的解答

在最初的尝试中,你使用re.escape()的方法是正确的。为了清楚起见,我将下面的列表解析转换为for循环。查找模式在列表find_patterns中,替换在replacements中,示例输入字符串在sample_inputs中:

import re

find_patterns = [
    "{Example1}",
    "{Example1\Alarm}",
    "{Example1\AlarmHigh}",
    "{Example1\AlarmLow}",
    "Example2",
    "Example2foo",
    "Example2foobar",
    "ATag",
    "Another_Tag",
    "Another\Tag"
]
replacements = [
    "{50M00_Dewirer_South\Example1}",
    "{50M00_Dewirer_South\Example1\Alarm}",
    "{50M00_Dewirer_South\Example1\AlarmHigh}",
    "{50M00_Dewirer_South\Example1\AlarmLow}",
    "50M00_Dewirer_South\Example2",
    "50M00_Dewirer_South\Example2foo",
    "50M00_Dewirer_South\Example2foobar",
    "Device_Shortcut\DirectReference",
    "Winder\Local:50:I.Data.0",
    "Winder\Local:12:O.Data.1"
]
sample_inputs = [
    "blAh(blah}{&^%$#@!blah{Example1}more_stuff&$^%{",
    "blAh(blah}{&^%$#@!blah{Example1\Alarm}more_stuff&$^%{",
    "blAh(blah}{&^%$#@!blah{Example1\AlarmHigh}more_stuff&$^%{",
    "blAh(blah}{&^%$#@!blah{Example1\AlarmLow}more_stuff&$^%{",
    "blAh(blah}{&^%$#@!blahExample2more_stuff&$^%{",
    "blAh(blah}{&^%$#@!blahExample2foomore_stuff&$^%{",
    "blAh(blah}{&^%$#@!blahExample2foobarmore_stuff&$^%{",
    "blAh(blah}{&^%$#@!blahATagmore_stuff&$^%{",
    "blAh(blah}{&^%$#@!blahAnother_Tagmore_stuff&$^%{",
    "blAh(blah}{&^%$#@!blahAnother\Tagmore_stuff&$^%{"
]
new_strings = []
for find_pattern, replacement, sample_input in zip(find_patterns, replacements, sample_inputs):
    new_string = re.sub(re.escape(find_pattern), re.escape(replacement), sample_input, flags=re.IGNORECASE)
    new_strings.append(new_string)
    print(f"{sample_input}\n{new_string}\n")

输出:

blAh(blah}{&^%$#@!blah{Example1}more_stuff&$^%{
blAh(blah}{&^%$#@!blah\{50M00_Dewirer_South\Example1\}more_stuff&$^%{

blAh(blah}{&^%$#@!blah{Example1\Alarm}more_stuff&$^%{
blAh(blah}{&^%$#@!blah\{50M00_Dewirer_South\Example1\Alarm\}more_stuff&$^%{

blAh(blah}{&^%$#@!blah{Example1\AlarmHigh}more_stuff&$^%{
blAh(blah}{&^%$#@!blah\{50M00_Dewirer_South\Example1\AlarmHigh\}more_stuff&$^%{

blAh(blah}{&^%$#@!blah{Example1\AlarmLow}more_stuff&$^%{
blAh(blah}{&^%$#@!blah\{50M00_Dewirer_South\Example1\AlarmLow\}more_stuff&$^%{

blAh(blah}{&^%$#@!blahExample2more_stuff&$^%{
blAh(blah}{&^%$#@!blah50M00_Dewirer_South\Example2more_stuff&$^%{

blAh(blah}{&^%$#@!blahExample2foomore_stuff&$^%{
blAh(blah}{&^%$#@!blah50M00_Dewirer_South\Example2foomore_stuff&$^%{

blAh(blah}{&^%$#@!blahExample2foobarmore_stuff&$^%{
blAh(blah}{&^%$#@!blah50M00_Dewirer_South\Example2foobarmore_stuff&$^%{

blAh(blah}{&^%$#@!blahATagmore_stuff&$^%{
blAh(blah}{&^%$#@!blahDevice_Shortcut\DirectReferencemore_stuff&$^%{

blAh(blah}{&^%$#@!blahAnother_Tagmore_stuff&$^%{
blAh(blah}{&^%$#@!blahWinder\Local:50:I\.Data\.0more_stuff&$^%{

blAh(blah}{&^%$#@!blahAnother\Tagmore_stuff&$^%{
blAh(blah}{&^%$#@!blahWinder\Local:12:O\.Data\.1more_stuff&$^%{

查找模式和替换模式都需要转义,而输入字符串不需要转义。
编辑:以下是我对original question的原始回答。
你想多了-你不需要担心括号,大括号或其他特殊字符。你所需要做的就是找到r"Example\d",其中\d是一个数字,大小写被忽略,然后用r"50M00_Dewirer_South\\\g<0>"替换它,其中\\是转义的\字符,\g<0>是整个匹配的子字符串。在Python中,其中strings是“查找”字符串的列表:

import re

strings = ["{Example1}",
"{Example1\\Alarm}",
"{Example1\\AlarmHigh}",
"{Example1\\AlarmLow}",
"Example2",
"Example2foo",
"Example2foobar"]

pattern = re.compile(r"Example\d", flags=re.IGNORECASE)
replaced_strings = [re.sub(pattern, r"50M00_Dewirer_South\\\g<0>", item) for item in strings]

输出:

['{50M00_Dewirer_South\\Example1}', 
 '{50M00_Dewirer_South\\Example1\\Alarm}',
 '{50M00_Dewirer_South\\Example1\\AlarmHigh}',
 '{50M00_Dewirer_South\\Example1\\AlarmLow}',
 '50M00_Dewirer_South\\Example2',
 '50M00_Dewirer_South\\Example2foo',
 '50M00_Dewirer_South\\Example2foobar']

这些只是字符串的Python表示,当它们被写入文件时,它们只包含一个\

相关问题