regex 如何检查数组中是否有字符串或正则表达式与输入字符串匹配?

exdqitrt  于 2022-12-01  发布在  其他
关注(0)|答案(3)|浏览(154)

我有一个字符串/正则表达式列表,我想检查它是否与字符串输入匹配。
假设我有这些列表:

$list = [ // an array list of string/regex that i want to check
  "lorem ipsum", // a words
  "example", // another word
  "/(nulla)/", // a regex
];

和字符串:

$input_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, nulla ac suscipit maximus, leo  metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";

所以,我希望它像这样检查:

if( $matched_string >= 1 ){ // check if there was more than 1 string matched or something...
 // do something...
 // output matched string: "lorem ipsum", "nulla"
}else{
 // nothing matched
}

我怎么能做这种事?

p4rjhz4m

p4rjhz4m1#

我不确定这种方法是否适用于您的情况,但是,您可以将它们都视为正则表达式。

$list = [ // an array list of string/regex that i want to check
  "lorem ipsum", // a words
  "Donec mattis",
  "example", // another word
  "/(nulla)/", // a regex
  "/lorem/i"
];
$input_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, nulla ac suscipit maximus, leo  metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";

$is_regex = '/^\/.*\/[igm]*$/';
$list_matches = [];
foreach($list as $str){
    // create a regex from the string if it isn't already
    $patt = (preg_match($is_regex, $str))? $str: "/$str/";
    $item_matches = [];
    preg_match($patt, $input_string, $item_matches);
    if(!empty($item_matches)){
        // only add to the list if matches
        $list_matches[$str] = $item_matches;
    }
}
if(empty($list_matches)){
    echo 'No matches from the list found';
}else{
    var_export($list_matches);
}

以上将输出以下内容:

array (
  'Donec mattis' => 
  array (
    0 => 'Donec mattis',
  ),
  '/(nulla)/' => 
  array (
    0 => 'nulla',
    1 => 'nulla',
  ),
  '/lorem/i' => 
  array (
    0 => 'Lorem',
  ),
)

Sandbox

m1m5dgzv

m1m5dgzv2#

通常,如果有人胆敢用错误抑制器来破坏代码,我会大声尖叫。如果输入数据超出了您的控制范围,以至于您允许混合使用regex和非regex输入字符串,那么我猜您可能也会在代码中容忍@
验证搜索字符串是否为regex。如果它不是有效的regex,则用分隔符将其括起来,并调用preg_quote()以形成有效的regex模式,然后将其传递给实际的haystack字符串。
代码:(Demo

$list = [ // an array list of string/regex that i want to check
  "lorem ipsum", // a words
  "example", // another word
  "/(nulla)/", // a valid regex
  "/[,.]/", // a valid regex
  "^dolor^", // a valid regex
  "/path/to/dir/", // not a valid regex
  "[integer]i", // valid regex not implementing a character class
];

$input_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, /path/to/dir/ nulla ac suscipit maximus, leo  metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";

$result = [];
foreach($list as $v) {
    if (@preg_match($v, '') === false) {
        // not a regex, make into one
        $v = '/' . preg_quote($v, '/') . '/';
    }
    preg_match($v, $input_string, $m);
    $result[$v] = $m[0] ?? null;
}
var_export($result);

或者你可以这样写同样的东西,但是我不知道通过检查非空字符串的模式是否会影响性能:(Demo

$result = [];
foreach($list as $v) {
    if (@preg_match($v, $input_string, $m) === false) {
        preg_match('/' . preg_quote($v, '/') . '/', $input_string, $m);
    }
    $result[$v] = $m[0] ?? null;
}
var_export($result);
a5g8bdjr

a5g8bdjr3#

请尝试以下操作:

<?php
$input_string = "Lorem ipsum dolor sit amet, consectetur adipiscing elit. Integer quam ex, vestibulum sed laoreet auctor, iaculis eget velit. Donec mattis, nulla ac suscipit maximus, leo  metus vestibulum eros, nec finibus nisl dui ut est. Nam tristique varius mauris, a faucibus augue.";

$list = [ // an array list of string/regex that i want to check
"Lorem ipsum", // a words
"consectetur", // another word
"/(nu[a-z]{2}a)/", // a regex
];
$regex_list = [];
foreach($list as $line) {
    if ($line[0] == '/' and $line[-1] == '/')
        $regex_list[] = substr($line, 1, -1);
    else
        $regex_list[] = preg_quote($line, $delimiter='/');
}
$regex = '/' . implode('|', $regex_list) . '/';
echo $regex, "\n";
preg_match_all($regex, $input_string, $matches, PREG_SET_ORDER);
print_r($matches);

$s = [];
foreach ($matches as &$match) {
    $s[] = $match[0];
}
$s = json_encode($s);
echo "Matched strings: ", substr($s, 1, -1), "\n";

印刷品:

/Lorem ipsum|consectetur|(nu[a-z]{2}a)/
Array
(
    [0] => Array
        (
            [0] => Lorem ipsum
        )

    [1] => Array
        (
            [0] => consectetur
        )

    [2] => Array
        (
            [0] => nulla
            [1] => nulla
        )

)
Matched strings: "Lorem ipsum","consectetur","nulla"

讨论与限制

在处理$list的每个元素时,如果字符串以“/”开头和结尾,则假定它是正则表达式,并从字符串的开头和结尾删除“/”字符。因此,不以这些字符开头和结尾的任何其他字符串都必须是普通字符串。这意味着如果OP要匹配恰好以'/开头和结尾的普通字符串',例如'/./',则它们必须将其改为作为正则表达式:'//.//'。纯字串会被呼叫preg_quote的结果所取代,以逸出在正则表达式中有意义的特殊字符,进而将它转换成不含开头和结尾'/'分隔符号的正则表达式。最后,所有字串会以正则表达式字符链接在一起,'|',然后在前面加上和后面加上'/'字符,以便从输入创建单个正则表达式。
主要的限制是,如果输入列表中的多个正则表达式具有捕获组,则不会自动调整反向引用编号,因为当正则表达式组合时,组编号将受到影响。因此,此类正则表达式模式必须识别具有捕获组的先前正则表达式模式,并相应地调整其反向引用(请参见下面的演示)。
正则表达式标志(即模式修饰符)必须嵌入正则表达式本身。由于$list的一个正则表达式字符串中的此类标志将影响另一个正则表达式字符串的处理,如果在一个正则表达式中使用的标志不适用于后续的正则表达式,则必须明确关闭这些标志:

<?php
$input_string = "This is an example by Booboo.";

$list = [ // an array list of string/regex that i want to check
"/(?i)booboo/", // case insensitive
"/(?-i)EXAMPLE/" // explicitly not case sensitive
];
$regex_list = [];
foreach($list as $line) {
    if ($line[0] == '/' and $line[-1] == '/')
        $regex_list[] = substr($line, 1, -1);
    else
        $regex_list[] = preg_quote($line, $delimiter='/');
}
$regex = '/' . implode('|', $regex_list) . '/';
echo $regex, "\n";
preg_match_all($regex, $input_string, $matches, PREG_SET_ORDER);
print_r($matches);

$s = [];
foreach ($matches as &$match) {
    $s[] = $match[0];
}
$s = json_encode($s);
echo "Matched strings: ", substr($s, 1, -1), "\n";

印刷品:

/(?i)booboo|(?-i)EXAMPLE/
Array
(
    [0] => Array
        (
            [0] => Booboo
        )

)
Matched strings: "Booboo"

下面说明如何通过手动调整组编号来 * 正确 * 处理反向引用:

<?php
$input_string = "This is the 22nd example by Booboo.";

$list = [ // an array list of string/regex that i want to check
"/([0-9])\\1/", // two consecutive identical digits
"/(?i)([a-z])\\2/" // two consecutive identical alphas
];
$regex_list = [];
foreach($list as $line) {
    if ($line[0] == '/' and $line[-1] == '/')
        $regex_list[] = substr($line, 1, -1);
    else
        $regex_list[] = preg_quote($line, $delimiter='/');
}
$regex = '/' . implode('|', $regex_list) . '/';
echo $regex, "\n";
preg_match_all($regex, $input_string, $matches, PREG_SET_ORDER);
print_r($matches);

$s = [];
foreach ($matches as &$match) {
    $s[] = $match[0];
}
$s = json_encode($s);
echo "Matched strings: ", substr($s, 1, -1), "\n";

印刷品:

/([0-9])\1|(?i)([a-z])\2/
Array
(
    [0] => Array
        (
            [0] => 22
            [1] => 2
        )

    [1] => Array
        (
            [0] => oo
            [1] =>
            [2] => o
        )

    [2] => Array
        (
            [0] => oo
            [1] =>
            [2] => o
        )

)
Matched strings: "22","oo","oo"

相关问题