php 使用正则表达式替换和可能不以单词字符开始/结尾的单词实现单词边界

syqv5f0l 于 2023-09-29 发布在 PHP

关注(0)|答案(2)|浏览(127)

我有以下代码：

//Array filled with data from external file
$patterns = array('!test!', 'stuff1', 'all!!', '');

//Delete empty values in array
$patterns = array_filter($patterns);

foreach($patterns as &$item){
       $item = preg_quote($item);
}

$pattern = '/(\b|^|- |--|-)(?:'.implode('|', $patterns).')(-|--| -|\b|$)/i';

$clid = "I am the !test! stuff1 all!! string";

echo $clid;
$clid = trim(preg_replace($pattern, ' ', $clid));
echo $clid;

输出量：

//I am the !test! stuff1 all!! string
//I am the !test! all!! string

我用preg_quote()转义了!，为什么？
我有第二个问题，现在已经解决了，但我不知道为什么会发生。假设$clid = "I am Jörg Müller with special chars"。如果删除代码行$patterns = array_filter($patterns);，则preg_replace()之后的输出为I am J。我不知道为什么，但我用array_filter()解决了这个问题。

php

来源：https://stackoverflow.com/questions/33641827/implement-word-boundaries-with-regex-alternations-and-words-that-might-not-begin

2条答案

按热度按时间

cidc1ykv1#

我会这样做：

$clid = "I am the !test! stuff1 all!! string";

$items = ['!test!', 'stuff1', 'all!!', ''];

$pattern = array_reduce($items, function ($c, $i) {
    return empty($i) ? $c : $c . preg_quote($i, '~') . '|';
}, '~[- ]+(?:');

$pattern .= '(*F))(?=[- ])~u';

$result = preg_replace($pattern, '', ' ' . $clid . ' ');
$result = trim($result, "- \t\n\r\0\x0b");

demo
这个想法是检查一个空格或一个连字符后的“字”与前瞻。以这种方式，该“分隔符”不被消耗，并且模式可以处理连续的匹配。
为了避免在模式的开头出现交替（比如(?:[- ]|^)[- ]*，这样会很慢），我在源字符串的开头添加了一个空格，在用trim替换后，该空格将被删除。
(*F)（迫使模式失败）在这里只是因为项目的交替是用array_reduce构建的，它允许尾部的|。
使用u修饰符可以解决ASCII范围之外的字符的问题。有了这个修饰符，正则表达式引擎就能够处理UTF-8编码的字符串。

赞(0）回复(0）举报 2023-09-29

flvlnr442#

问题是您正在使用\b来Assertword boundaries。但是，字符"!"不是word character，并且\b在" !"之间不匹配。
以下是$clid中的单词边界：

I   a m   t h e   ! t e s t !   s t u f f 1   a l l ! !   s t r i n g
^ ^ ^   ^ ^     ^   ^       ^   ^           ^ ^     ^     ^           ^

你可以使用lookarounds来Assert每一个项目是：

(?:-[- ]?| +)匹配-[ ]、-、--或一个或多个空格。
(?:-[- ]?|(?= )|$)匹配-[ ]，-，--，或者声明它后面跟着一个空格或行尾。

正则表达式

$pattern = '/(?:-[- ]?| +)(?:'.implode('|', $patterns).')(?:-[- ]?|(?= )|$)/i';

验证码

//Array filled with data from external file
$patterns = array('!test!', 'stuff1', 'all!!', '');

//Delete empty values in array
$patterns = array_filter($patterns);

foreach($patterns as &$item){
       $item = preg_quote($item);
}

$pattern = '/(?:-[- ]?| +)(?:'.implode('|', $patterns).')(?:-[- ]?|(?= )|$)/i';

$clid = "I am the !test! stuff1 all!! string and !test!! not matched";
$clid = trim(preg_replace($pattern, '', $clid));

echo $clid;

输出

I am the string and !test!! not matched

ideone demo
至于你的第二个问题，你的数组中有一个空项目。因此，正则表达式将变为：

(?:option1|option2|option3|)
                           ^

注意这里有第四个选项：空子模式。空的子模式总是匹配。你的正则表达式可以解释为：

/(\b|^|- |--|-)(-|--| -|\b|$)/i

所以你才有了意想不到的结果
array_filter()通过删除空项目解决了您的问题。

赞(0）回复(0）举报 2023-09-29

我来回答

php 使用正则表达式替换和可能不以单词字符开始/结尾的单词实现单词边界

2条答案

相关问题

热门标签

最新问答