regex 如何将Python正则表达式中的匹配替换为匹配的修改版本?

n6lpvg4x  于 2023-06-07  发布在  Python
关注(0)|答案(2)|浏览(129)

我写了这段代码来搜索特定文件夹的文本文件中的单词匹配并指定它们:

import re, os, sys
from pathlib import Path

#Usage: regs directory
try:
    if len(sys.argv) == 2:
        folder = sys.argv[1]
        fList = os.listdir(folder)
        uInput = input('input a regex: ')
        regObj = re.compile(f'''{uInput}''')
        wordReg = re.compile(r'''([A-Za-z0-9]+|\s+|[^\w\s]+)''')
        matches = []
        print(fList)

        for file in fList:
            if not os.path.isdir(Path(folder)/Path(file)):
                currentFileObj = open(f'{folder}/{file}')
                content = currentFileObj.readlines()
                currentFileObj.seek(0)
                text = currentFileObj.read()
                words = wordReg.findall(text)
                matches = list(filter(regObj.match, words))
                instances = 0
                print(f"matches in ({file}):\n'", end='')
                for word in words:
                    if word in matches:
                        print("\u0333".join(f"{word} "), end='')
                    else:
                        print(word, end='')
                print("'")
                for line in content:
                    matches = regObj.findall(line)
                    for match in matches:
                        print("\u0333".join(f"{match} "), end=' ')
                        print(f"in line number {content.index(line)+1}")
                        if match != '':
                            instances = instances + 1
                print(f'number of instances found: {instances}\n')
            else:
                continue
    else:
        print('Usage: regs directory')
except FileNotFoundError:
    print("that file doesn't exist.")
except PermissionError:
    print("you don't have permission to search that folder.")

它在大多数情况下都可以工作,除了少数正则表达式,如果正则表达式有标点符号或其他字符旁边的白色字符,它不会下划线,如果我找到一种方法来替换匹配的修改版本(用下划线版本替换匹配),它可能会工作。
here's what it looks like for any other regex.
you can see in the first text file it doesn't underline the match (out.)
我试着寻找函数,将替代匹配与修改说匹配,似乎没有像有任何?
还有一个小问题,它不能正确地下划线空格和标点符号,下划线字符不会出现在windows7命令提示符中,也许下划线以外的其他字符可以工作?

ryevplcw

ryevplcw1#

如果您的目标是在代码中对匹配项加下划线,则可以修改打印逻辑,通过使用\u0332 so将匹配项替换为加下划线的版本。

underlined_match = "\u0332".join(f"{match}\u0332")
print(underlined_match, end=' ')

否则,如果您的目标是更改正则表达式,以便它捕获标点符号和正常字符之间的空白(a-z 0 -9),那么这个正则表达式可能会帮助您

(?:[A-Za-z0-9]+(?:[^\w\s]*[A-Za-z0-9]+[^\w\s]*)*)|(?:[^\w\s]+)
50few1ms

50few1ms2#

我已经想好了答案:使用lambda函数作为具有re.sub i的repl=变量能够修改匹配,然后使用它们来替换。

import re, os, sys
    from pathlib import Path

    #Usage:regs directory
    try:
        if len(sys.argv) == 2:
            folder = sys.argv[1]
            fList = os.listdir(folder)
            print("folder contents: ", end=' ')
            for f in fList:
                if not f == fList[-1]:
                    print(f, end=', ')
                else:
                    print(f, end='.\n\n')
            uInput = input('input a regex: ')
            print()
            regObj = re.compile(f'''{uInput}''')
            wordReg = re.compile(r'''([A-Za-z0-9]+|\s+|[^\w\s]+)''')
            matches = []
            
            for file in fList:
                if os.path.isfile(Path(folder)/Path(file)):
                    currentFileObj = open(f'{folder}/{file}')
                    lines = currentFileObj.readlines()
                    currentFileObj.seek(0)
                    text = currentFileObj.read()
                    words = wordReg.findall(text)
                    matches = list(filter(regObj.match, words))
                    instances = 0
                    print(f"matches in ({file}):\n'", end='')
                    print(regObj.sub(lambda match: "(" + match.group() + ")", text)+"'")
                    for line in lines:
                        matches = regObj.findall(line)
                        for match in matches:
                            print((f"({match})"), end=' ')
                            print(f"in line number {lines.index(line)+1}")
                            if match != '':
                                instances = instances + 1
                    print(f'number of instances found: {instances}\n')
                else:
                    continue
        else:
            print('Usage:regs directory')
    except FileNotFoundError:
        print("that file doesn't exist.")
    except PermissionError:
        print("you don't have permission to search that folder.")

它没有使用循环遍历字符串的单词列表,而是像这样打印括号之间的匹配组:

print(regObj.sub(lambda match: "(" + match.group() + ")", text)+"'")

The output now looks like this.
它现在还打印文件夹内容。

相关问题