perl 无法获取多行正则表达式以匹配字符串

z3yyvxxp  于 2022-11-15  发布在  Perl
关注(0)|答案(2)|浏览(209)

我正在阅读一个HTML文件,试图从中获取一些信息。我尝试过HTML解析器,但不知道如何使用它们来获取关键文本。原始版本读取html文件,但这个版本是一个最小的工作示例,用于StackOverflow目的。

#!/usr/bin/env perl

use 5.036;
use warnings FATAL => 'all';
use autodie ':default';
use Devel::Confess 'color';

sub regex_test ( $string, $regex ) {
    if ($string =~ m/$regex/s) {
        say "$string matches $regex";
    } else {
        say "$string doesn't match $regex";
    }
}
# the HTML text is $s
my $s = '      rs577952184 was merged into
      
        <a target="_blank"
           href="rs59222162">rs59222162</a>
      
';

regex_test ( $s, 'rs\d+ was merged into.*\<a target="_blank".+href="rs(\d+)/');

但是,这不匹配。
我认为问题是“merged into”后面的换行符不匹配。
如何修改上面的正则表达式以匹配$s

evrscar2

evrscar21#

问题出在$regex中的尾随/字符,应将其省略或更改为"

sr4lhrrt

sr4lhrrt2#

use strict;
use warnings;
use feature 'say';

my $s = '      rs577952184 was merged into
      
        <a target="_blank"
           href="rs59222162">rs59222162</a>
      
';

my $re = qr/rs\d+ was merged into\s+<a target="_blank"\s+href="rs(\d+)">rs\d+<\/a>/;

regex_test($s,$re);

exit 0;

sub regex_test {
    my $string = shift;
    my $regex  = shift;
    
    say $string =~ m/$regex/s 
        ? "$string matches $regex"
        : "$string doesn't match $regex";
}

输出量

rs577952184 was merged into

        <a target="_blank"
           href="rs59222162">rs59222162</a>

 matches (?^:rs\d+ was merged into\s+<a target="_blank"\s+href="rs(\d+)">rs\d+</a>)

相关问题