php 解析标签格式为全大写且标签值后跟的格式化字符串,以生成关联数组

xwmevbvl  于 2023-02-18  发布在  PHP
关注(0)|答案(3)|浏览(122)
$string = 'Audi MODEL 80 ENGINE 1.9 TDi';
list($make,$model,$engine) = preg_split('/( MODEL | ENGINE )/',$string);

“MODEL”之前的任何字符串都将被视为“MAKE字符串”。
“ENGINE”之前的任何字符串都将被视为“MODEL string”。
“ENGINE”之后的任何字符串都是“ENGINE字符串”。
但是我们通常在这个字符串中有更多的信息。

//  possible variations:
$string = 'Audi MODEL 80 ENGINE 1.9 TDi POWER 90Hk TORQUE 202Nm GEAR man DRIVE 2wd YEAR 1996';

$string = 'Audi MODEL 80 ENGINE 1.9 TDi YEAR 1996 NOTE this engine needs custom stage GEAR auto';    

$string = 'Audi MODEL 80 ENGINE 1.9 TDi GEAR man YEAR 1996';

$string = 'Audi MODEL 80 ENGINE 1.9 TDi YEAR 1996 DRIVE 2wd';

MODELENGINE始终存在,并且始终是字符串的开头。
其余的(POWERTORQUEGEARDRIVEYEARNOTE)可能会有所不同,包括排序顺序以及它们是否在那里。
由于我们不能确定ENGINE字符串是如何结束的,或者其他关键字中的哪一个将是紧随其后的第一个,所以我认为可以用这些关键字创建一个数组。
然后执行某种字符串搜索,查找与数组中某个关键字匹配的第一个单词。
我确实需要保留匹配的单词。
另一种说法可能是:“如何在数组中每次出现单词时/之前拆分字符串”

4uqofj5v

4uqofj5v1#

如果您希望使用动态关联数组:
1.将MAKE添加到字符串前面
1.使用preg_match_all()捕获格式化字符串中的标签和值对
1.使用array_column()将匹配列重新构造为关联数组。
代码:(Demo

$strings = [
    'Audi MODEL 80 ENGINE 1.9 TDi POWER 90Hk TORQUE 202Nm GEAR man DRIVE 2wd YEAR 1996',
    'Audi MODEL 80 ENGINE 1.9 TDi YEAR 1996 NOTE this engine needs custom stage GEAR auto',
    'Audi MODEL 80 ENGINE 1.9 TDi GEAR man YEAR 1996',
    'Audi MODEL 80 ENGINE 1.9 TDi YEAR 1996 DRIVE 2wd'
];

foreach ($strings as $string) {
    preg_match_all('/\b([A-Z]+)\s+(\S+(?:\s+\S+)*?)(?=$|\s+[A-Z]+\b)/', 'MAKE ' . $string, $m, PREG_SET_ORDER);
    var_export(array_column($m, 2, 1));
    echo "\n---\n";
}

输出:

array (
  'MAKE' => 'Audi',
  'MODEL' => '80',
  'ENGINE' => '1.9 TDi',
  'POWER' => '90Hk',
  'TORQUE' => '202Nm',
  'GEAR' => 'man',
  'DRIVE' => '2wd',
  'YEAR' => '1996',
)
---
array (
  'MAKE' => 'Audi',
  'MODEL' => '80',
  'ENGINE' => '1.9 TDi',
  'YEAR' => '1996',
  'NOTE' => 'this engine needs custom stage',
  'GEAR' => 'auto',
)
---
array (
  'MAKE' => 'Audi',
  'MODEL' => '80',
  'ENGINE' => '1.9 TDi',
  'GEAR' => 'man',
  'YEAR' => '1996',
)
---
array (
  'MAKE' => 'Audi',
  'MODEL' => '80',
  'ENGINE' => '1.9 TDi',
  'YEAR' => '1996',
  'DRIVE' => '2wd',
)
---

这不是一个新概念/技术。唯一需要做的调整是 * 如何 * 识别原始字符串中的键/标签。您可能希望使用explicitly name each label and separate them in the pattern with pipes来代替[A-Z]+。请参阅以下其他演示:

或者,不使用正则表达式来解析字符串,而是将字符串转换为本地PHP函数可以解析的标准格式。

foreach ($strings as $string) {
    var_export(
        parse_ini_string(
            preg_replace(
                '~\s*\b(MAKE|MODEL|ENGINE|POWER|TORQUE|GEAR|DRIVE|YEAR|NOTE)\s+~',
                "\n$1=",
                'MAKE ' . $string
            )
        )
    );
    echo "\n---\n";
}
pbwdgjma

pbwdgjma2#

要保持关键字包含的"位"不变,您可以使用带有预视的preg_split,预视将在空格后拆分任意一个关键字。例如:

$string = 'Audi MODEL 80 ENGINE 1.9 TDi POWER 90Hk TORQUE 202Nm GEAR man DRIVE 2wd YEAR 1996';

$bits = preg_split('~\s+(?=(MODEL|ENGINE|POWER|TORQUE|GEAR|DRIVE|YEAR|NOTE)\b)~', $string);

结果:

array(8) {
    [0] · string(4) "Audi"
    [1] · string(8) "MODEL 80"
    [2] · string(14) "ENGINE 1.9 TDi"
    [3] · string(10) "POWER 90Hk"
    [4] · string(12) "TORQUE 202Nm"
    [5] · string(8) "GEAR man"
    [6] · string(9) "DRIVE 2wd"
    [7] · string(9) "YEAR 1996"
}

如果你想把这些解析成键/值对,很简单:

// Initialize array; get the "unnamed" make:
$data = [
    'MAKE' => array_shift($bits),
];

// Iterate any other known keys found:
foreach($bits as $bit) {
    $pair = explode(' ', $bit, 2);
    $data[$pair[0]] = $pair[1];
}

结果:

array(8) {
    ["MAKE"] · string(4) "Audi"
    ["MODEL"] · string(2) "80"
    ["ENGINE"] · string(7) "1.9 TDi"
    ["POWER"] · string(4) "90Hk"
    ["TORQUE"] · string(5) "202Nm"
    ["GEAR"] · string(3) "man"
    ["DRIVE"] · string(3) "2wd"
    ["YEAR"] · string(4) "1996"
}
wooyq4lh

wooyq4lh3#

如果你更喜欢非RegEx方法,你也可以把它分解成单独的标记(单词)并构建一个数组。下面的代码对空格做了一些假设,如果它是一个问题,可以通过替换来解决。

// The first group to assign un-prefixed items to
$firstGroup = 'MAKE';

// Every possible word grouping
$wordList = ['ENGINE', 'MODEL', 'POWER', 'TORQUE', 'GEAR', 'DRIVE', 'YEAR'];

// Test string
$string = 'Audi MODEL 80 ENGINE 1.9 TDi POWER 90Hk TORQUE 202Nm GEAR man DRIVE 2wd YEAR 1996';

// Key/value of group name and values
$groups = [];

// Default to the first group
$currentWord = $firstGroup;
foreach (explode(' ', $string) as $word) {

    // Found a special word, reset and continue the hunt
    if (in_array($word, $wordList)) {
        $currentWord = $word;
        continue;
    }

    // Assign. The subsequent for loop could be removed by just doing string concatenation here instead
    $groups[$currentWord][] = $word;
}

// Optional, join each back into a string
foreach ($groups as $key => $values) {
    $groups[$key] = implode(' ', $values);
}

var_dump($groups);

输出:

array(8) {
  ["MAKE"]=>
  string(4) "Audi"
  ["MODEL"]=>
  string(2) "80"
  ["ENGINE"]=>
  string(7) "1.9 TDi"
  ["POWER"]=>
  string(4) "90Hk"
  ["TORQUE"]=>
  string(5) "202Nm"
  ["GEAR"]=>
  string(3) "man"
  ["DRIVE"]=>
  string(3) "2wd"
  ["YEAR"]=>
  string(4) "1996"
}

演示:https://3v4l.org/D4pvl

相关问题