php 除非包含在单引号或双引号中,

wgmfuz8q  于 2023-10-15  发布在  PHP
关注(0)|答案(4)|浏览(134)

我有一个像aa | bb | "cc | dd" | 'ee | ff'这样的字符串,我正在寻找一种方法来分割它,以获得由|字符分隔的所有值,其中|包含在字符串中。
我们的想法是得到像这样的[a, b, "cc | dd", 'ee | ff']
我已经找到了一个类似问题的答案:https://stackoverflow.com/a/11457952/11260467
然而,我找不到一种方法来适应多个分隔符的情况,这里有没有人比我更笨,当谈到正则表达式?

hujrc8aj

hujrc8aj1#

这可以通过pcre提供的(*SKIP)(*FAIL)功能轻松实现:

(['"]).*?\1(*SKIP)(*FAIL)|\s*\|\s*

PHP中,这可能是:

<?php

$string = "aa | bb | \"cc | dd\" | 'ee | ff'";

$pattern = '~([\'"]).*?\1(*SKIP)(*FAIL)|\s*\|\s*~';

$splitted = preg_split($pattern, $string);
print_r($splitted);
?>

并且会屈服于

Array
(
    [0] => aa
    [1] => bb
    [2] => "cc | dd"
    [3] => 'ee | ff'
)

参见a demo on regex101.comon ideone.com

qqrboqgw

qqrboqgw2#

如果您匹配零件(而不是拆分),这会更容易。模式默认是贪婪的,它们会消耗尽可能多的字符。这允许在为未加引号的标记提供模式之前为加引号的字符串定义更复杂的模式:

$subject = '[ aa | bb | "cc | dd" | \'ee | ff\' ]';

$pattern = <<<'PATTERN'
(
    (?:[|[]|^) # after | or [ or string start
    \s*
    (?<token> # name the match
        "[^"]*" # string in double quotes
        |
        '[^']*'  # string in single quotes
        |
        [^\s|]+ # non-whitespace 
    )
    \s*
)x
PATTERN;

preg_match_all($pattern, $subject, $matches);
var_dump($matches['token']);

输出量:

array(4) {
  [0]=>
  string(2) "aa"
  [1]=>
  string(2) "bb"
  [2]=>
  string(9) ""cc | dd""
  [3]=>
  string(9) "'ee | ff'"
}

提示:

  1. <<<'PATTERN'被称为HEREDOC语法,减少了转义
    1.我使用()作为模式分隔符-它们是组0
    1.匹配使代码更具可读性
    1.修饰符x允许对模式进行标记和注解
c86crjj0

c86crjj03#

使用

$string = "aa | bb | \"cc | dd\" | 'ee | ff'";
preg_match_all("~(?|\"([^\"]*)\"|'([^']*)'|([^|'\"]+))(?:\s*\|\s*|\z)~", $string, $matches);
print_r(array_map(function($x) {return trim($x);}, $matches[1]));

请参见PHP proof

结果

Array
(
    [0] => aa
    [1] => bb
    [2] => cc | dd
    [3] => ee | ff
)

说明

--------------------------------------------------------------------------------
  (?|                      Branch reset group, does not capture:
--------------------------------------------------------------------------------
    \"                       '"'
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      [^\"]*                   any character except: '\"' (0 or more
                               times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
    \"                       '"'
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    '                        '\''
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      [^']*                    any character except: ''' (0 or more
                               times (matching the most amount
                               possible))
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
    '                        '\''
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    (                        group and capture to \1:
--------------------------------------------------------------------------------
      [^|'\"]+                 any character except: '|', ''', '\"'
                               (1 or more times (matching the most
                               amount possible))
--------------------------------------------------------------------------------
    )                        end of \1
--------------------------------------------------------------------------------
  )                        end of grouping
--------------------------------------------------------------------------------
  (?:                      group, but do not capture:
--------------------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
    \|                       '|'
--------------------------------------------------------------------------------
    \s*                      whitespace (\n, \r, \t, \f, and " ") (0
                             or more times (matching the most amount
                             possible))
--------------------------------------------------------------------------------
   |                        OR
--------------------------------------------------------------------------------
    \z                       the end of the string
--------------------------------------------------------------------------------
  )                        end of grouping
23c0lvtd

23c0lvtd4#

有趣的是,有很多方法可以构造这个问题的正则表达式。这是另一个类似于@Jan的答案。

(['"]).*?\1\K| *\| *

PCRE Demo

(['"]) # match a single or double quote and save to capture group 1
.*?    # match zero or more characters lazily
\1     # match the content of capture group 1
\K     # reset the starting point of the reported match and discard
       # any previously-consumed characters from the reported match
|      # or
\ *    # match zero or more spaces
\|     # match a pipe character
\ *    # match zero or more spaces

请注意,管道字符(“or”)之前的部分仅用于将引擎的内部字符串指针移动到刚过右引号或带引号的子字符串。

相关问题