regex 正则表达式可以返回找到匹配的行号吗？

js4nwp54 于 2023-10-22 发布在其他

关注(0)|答案(4)|浏览(102)

在文本编辑器中，我想用找到这个单词的行号替换给定的单词。Regex可以做到这一点吗？

regex

来源：https://stackoverflow.com/questions/23727476/can-a-regex-return-the-number-of-the-line-where-the-match-is-found

4条答案

按热度按时间

rbpvctlc1#

递归、自引用组（Qtax技巧）、反向Qtax或平衡组

简介

在输入的底部添加一个整数列表的想法类似于一个著名的数据库黑客（与正则表达式无关），其中一个连接到一个整数表。我最初的回答使用了@Qtax技巧。目前的答案使用递归，Qtax技巧（直接或反向变化）或平衡组。

是的，有可能...带有一些警告和正则表达式技巧。

1.这个答案中的解决方案是为了演示一些正则表达式语法，而不是要实现的实际答案。
1.在文件的末尾，我们将粘贴一个带有唯一前缀的数字列表。对于这个实验，附加的字符串是:1:2:3:4:5:6:7这是一种类似于使用整数表的著名数据库黑客的技术。
1.对于前两个解决方案，我们需要一个使用正则表达式风格的编辑器，允许递归（解决方案1）或自引用捕获组（解决方案2和3）。我想到两个：Notepad++和EditPad Pro。对于第三种解决方案，我们需要一个支持平衡组的编辑器。这可能会限制我们使用EditPad Pro或Visual Studio 2013+。

输入文件：

假设我们正在搜索pig，并希望将其替换为行号。
我们将使用它作为输入：

my cat
dog
my pig
my cow
my mouse

:1:2:3:4:5:6:7

第一种解决方案：递归

支持的语言：除了上面提到的文本编辑器（Notepad++和EditPad Pro），这个解决方案应该可以在使用PCRE的语言（PHP、R、 Delphi ）、Perl和使用Matthew巴内特的regex模块（未经测试）的Python中工作。
递归结构存在于前瞻中，并且是可选的。它的工作是平衡左边不包含pig的行和右边的数字：把它看作是平衡一个嵌套的结构，比如{{{ }}}...除了左边是无匹配线，右边是数字。关键是，当我们退出lookahead时，我们知道有多少行被跳过了。

搜索：

(?sm)(?=.*?pig)(?=((?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?:(?1)|[^:]+)(:\d+))?).*?\Kpig(?=.*?(?(2)\2):(\d+))

带注解的自由间距版本：

(?xsm)             # free-spacing mode, multi-line
(?=.*?pig)        # fail right away if pig isn't there

(?=               # The Recursive Structure Lives In This Lookahead
(                 # Group 1
   (?:               # skip one line 
      ^              
      (?:(?!pig)[^\r\n])*  # zero or more chars not followed by pig
      (?:\r?\n)      # newline chars
    ) 
    (?:(?1)|[^:]+)   # recurse Group 1 OR match all chars that are not a :
    (:\d+)           # match digits
)?                 # End Group 
)                 # End lookahead. 
.*?\Kpig                # get to pig
(?=.*?(?(2)\2):(\d+))   # Lookahead: capture the next digits

替换：\3

在the demo中，参见底部的替换。您可以尝试使用前两行的字母（删除一个空格以生成pig），将第一个出现的pig移动到另一行，看看这对结果有何影响。

第二种解决方案：自我欺骗的团体（“Qtax把戏”）

支持的语言：除了上面提到的文本编辑器（Notepad++和EditPad Pro），这个解决方案应该可以在使用PCRE的语言（PHP、R、 Delphi ）、Perl和使用Matthew巴内特的regex模块（未经测试）的Python中工作。通过将\K转换为前瞻，将所有格量词转换为原子组，该解决方案很容易适应.NET（请参阅下面几行的.NET版本）。

搜索：

(?sm)(?=.*?pig)(?:(?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?=[^:]+((?(1)\1):\d+)))*+.*?\Kpig(?=[^:]+(?(1)\1):(\d+))

.NET版本：回到未来

.NET没有\K。它的位置，我们使用一个“回到未来”的向后看（一个向后看，其中包含一个向前看，跳到前面的匹配）。此外，我们需要使用原子群而不是所有格量词。

(?sm)(?<=(?=.*?pig)(?=(?>(?:^(?:(?!pig)[^\r\n])*(?:\r?\n))(?=[^:]+((?(1)\1):\d+)))*).*)pig(?=[^:]+(?(1)\1):(\d+))

带注解的自由间距版本（Perl / PCRE版本）：

(?xsm)             # free-spacing mode, multi-line
(?=.*?pig)        # lookahead: if pig is not there, fail right away to save the effort
(?:               # start counter-line-skipper (lines that don't include pig)
   (?:               # skip one line 
      ^              # 
      (?:(?!pig)[^\r\n])*  # zero or more chars not followed by pig
      (?:\r?\n)      # newline chars
    )   
   # for each line skipped, let Group 1 match an ever increasing portion of the numbers string at the bottom
   (?=             # lookahead
      [^:]+           # skip all chars that are not colons
      (               # start Group 1
        (?(1)\1)      # match Group 1 if set
        :\d+          # match a colon and some digits
      )               # end Group 1
   )               # end lookahead
)*+               # end counter-line-skipper: zero or more times
.*?               # match
\K                # drop everything we've matched so far
pig               # match pig (this is the match!)
(?=[^:]+(?(1)\1):(\d+))   # capture the next number to Group 2

替换：

\2

输出：

my cat
dog
my 3
my cow
my mouse

:1:2:3:4:5:6:7

在the demo中，参见底部的替换。您可以尝试使用前两行上的字母（删除一个空格以使pig）将第一个出现的pig移动到另一行，并查看这对结果的影响。

数字分隔符的选择

在我们的例子中，数字串的:是相当常见的，并且可能发生在其他地方。我们可以发明一个UNIQUE_DELIMITER并稍微调整表达式。但下面的优化甚至更高效，让我们保持:

第二方案优化：反向数字串

与其按顺序粘贴数字，不如按相反的顺序使用它们：:7:6:5:4:3:2:1
在我们的lookaheads中，这允许我们使用简单的.*到达输入的底部，并从那里开始回溯。因为我们知道我们在字符串的末尾，所以我们不必担心:digits是字符串的另一部分。这是怎么做的。

输入：

my cat pi g
dog p ig
my pig
my cow
my mouse

:7:6:5:4:3:2:1

搜索：

(?xsm)             # free-spacing mode, multi-line
(?=.*?pig)        # lookahead: if pig is not there, fail right away to save the effort
(?:               # start counter-line-skipper (lines that don't include pig)
   (?:               # skip one line that doesn't have pig
      ^              # 
      (?:(?!pig)[^\r\n])*  # zero or more chars not followed by pig
      (?:\r?\n)      # newline chars
    )   
   # Group 1 matches increasing portion of the numbers string at the bottom
   (?=             # lookahead
      .*           # get to the end of the input
      (               # start Group 1
        :\d+          # match a colon and some digits
        (?(1)\1)      # match Group 1 if set
      )               # end Group 1
   )               # end lookahead
)*+               # end counter-line-skipper: zero or more times
.*?               # match
\K                # drop match so far
pig               # match pig (this is the match!)
(?=.*(\d+)(?(1)\1))   # capture the next number to Group 2

替换：\2

参见the demo中的替换。

第三种解决方案：平衡组

此解决方案特定于. NET。

搜索：

(?m)(?<=\A(?<c>^(?:(?!pig)[^\r\n])*(?:\r?\n))*.*?)pig(?=[^:]+(?(c)(?<-c>:\d+)*):(\d+))

带注解的自由间距版本：

(?xm)                # free-spacing, multi-line
(?<=                 # lookbehind
   \A                # 
   (?<c>               # skip one line that doesn't have pig
                       # The length of Group c Captures will serve as a counter
     ^                    # beginning of line
     (?:(?!pig)[^\r\n])*  # zero or more chars not followed by pig
     (?:\r?\n)            # newline chars
   )                   # end skipper
   *                   # repeat skipper
   .*?                 # we're on the pig line: lazily match chars before pig
   )                # end lookbehind
pig                 # match pig: this is the match
(?=                 # lookahead
   [^:]+               # get to the digits
   (?(c)               # if Group c has been set
     (?<-c>:\d+)         # decrement c while we match a group of digits
     *                   # repeat: this will only repeat as long as the length of Group c captures > 0 
   )                   # end if Group c has been set
   :(\d+)              # Match the next digit group, capture the digits
)                    # end lokahead

替换：$1

参考文档

Qtax trick
在哪个行号上找到正则表达式匹配？

赞(0）回复(0）举报 2023-10-22

3qpi33ja2#

因为你没有指定哪个文本编辑器，在Vim中它将是：
:%s/searched_word/\=printf('%-4d', line('.'))/g(read more)
但正如有人提到的，这不是一个问题，所以，而是超级用户;）

赞(0）回复(0）举报 2023-10-22

mwngjboj3#

我不知道有哪种编辑器能做到这一点，但却没有扩展一个允许任意扩展的编辑器。
不过，您可以轻松地使用perl来完成任务。

perl -i.bak -e"s/word/$./eg" file

或者如果你想使用通配符，

perl -MFile::DosGlob=glob -i.bak -e"BEGIN { @ARGV = map glob($_), @ARGV } s/word/$./eg" *.txt

赞(0）回复(0）举报 2023-10-22

yh2wf1be4#

使用Raku（以前称为Perl_6）

要将目标word所在的整行替换为linerber：*

~$ raku -ne 'state $i; ++$i; put m/word/ ?? $i !! $_;'  file

要将目标word的每个示例替换为linerber（全局替换）：*

~$ raku -pe 'state $i; ++$i; s:g/word/{$i}/;' file

这个答案是为了补充@ikegami发布的优秀Perl答案。Raku和Perl一样，是跨平台的。以上是Unix/Linux系统的答案。在Windows中使用双引号而不是单引号（尽管根据@ikegami，WSL使用单引号。谢谢！）。
第一个代码示例如下所示：使用-ne非自动打印的逐行标记，state是一个计数器变量$i。使用++i递增变量。使用Raku的三元运算符Test??True!!False，输出（即put）如果发现与word匹配，则递增$i变量，否则输出$_原始行。
第二个代码示例如下所示：使用-pe自动打印逐行标记，state是一个计数器变量$i。使用++i递增变量。使用Raku的s:g///全局替换运算符将word的每个匹配替换为$i计数器。
样品输入：

my cat
dog
my pig
my cow
my mouse
my pig also

全局替换pig的示例输出（上面的第二个代码示例）：

my cat
dog
my 3
my cow
my mouse
my 6 also

注意：后增量可以用$++来完成，它将对行进行0索引而不是1索引。Regex匹配器实际上可以写成/ … /，即如果使用正斜杠，则不使用m，如果您试图在Regex中匹配正斜杠，则甚至不使用m{ … }。
此外，您还可以向m/ … /或s///匹配器添加:g global以外的许多Regex“副词”，其中最有用的可能是:i，用于不区分大小写的匹配，如下所示：
m:i/ … /;或s:i:g/…/…/;
更多关于Regex副词的信息在底部。
最后，如果您对Vim相当熟悉，则可以在shell命令行中使用~$ vim file打开该文件，然后使用:冒号进入命令行模式。在Vim命令行中输入%! raku -pe 'state $i; ++$i; s:g/pig/{$i}/;'对文件运行Raku命令。保存到一个新的文件或覆盖原来的自由裁量权。
https://docs.raku.org/language/regexes
https://raku.org

赞(0）回复(0）举报 2023-10-22

我来回答

regex 正则表达式可以返回找到匹配的行号吗？

4条答案

递归、自引用组（Qtax技巧）、反向Qtax或平衡组

第一种解决方案：递归

第二种解决方案：自我欺骗的团体（“Qtax把戏”）

第二方案优化：反向数字串

第三种解决方案：平衡组

参考文档

相关问题

热门标签

最新问答