为什么Java的正则表达式模式/匹配器会错误地计算unicode字符串中的组位置

8zzbczxx 于 2023-02-07 发布在 Java

关注(0)|答案(1)|浏览(62)

我正在尝试使用正则表达式，字符串中包含unicode字符，如''和''。模式/匹配器正在查找我要查找的表达式，但匹配器返回了错误的匹配开始位置，因此在这种情况下所有匹配器组都不正确。

public static void main(String[] args) {
        String test = "±1℃ ±5% 3kΩ";
        for (int index = 0; index<test.length(); index++)
            System.out.println(" char at " + index + ": " + test.charAt(index) + " \\u" +
                    Integer.toHexString(test.charAt(index) | 0x10000).substring(1));
        Pattern pattern = Pattern.compile("(?<number>[0-9]*(\\.[0-9]+)?)(?<multiplier>[KM])?Ω");
        Matcher matcher = pattern.matcher(test);
        if (matcher.find()) {
            System.out.println("info: " + matcher.start());
            System.out.println("found \"" + matcher.group("number") + "\" \"" +
                    matcher.group("multiplier") + "\" in \"" + test + "\"");
        }
    }

由于匹配器确实找到了序列，所以我期望Matcher.group（“number”）返回“3”，Matcher.group（“multiplier”）应该生成“k”。
所以最后一次打印应该会产生：

found "3" "k"

相反，我得到：

info: 14
found "" "null"

“info”行给出了一个提示，匹配器认为匹配从位置14开始。
但是for循环打印字符的位置和打印：

char at 0: � \u00c2
 char at 1: � \u00b1
 char at 2: 1 \u0031
 char at 3: � \u00e2
 char at 4: � \u201e
 char at 5: � \u0192
 char at 6:   \u0020
 char at 7: � \u00c2
 char at 8: � \u00b1
 char at 9: 5 \u0035
 char at 10: % \u0025
 char at 11:   \u0020
 char at 12: 3 \u0033
 char at 13: k \u006b
 char at 14: � \u00ce
 char at 15: � \u00a9

并且从中我们看到匹配真正开始的字符位置应该是12（“3”）。
为什么正则表达式模式/匹配器找到匹配项，但计算group（）方法的位置不正确？
我可以对字符串做些什么来将其转换成某种神奇的编码，或者我可以对模式或匹配器做些什么来使它们产生预期的结果？
字符串编码...呃。

Java

来源：https://stackoverflow.com/questions/75357000/why-does-javas-regex-pattern-matcher-miscount-group-positions-in-strings-with-u

1条答案

按热度按时间

monwx1rj1#

结果发现我是个白痴。
测试字符串有一个小写的'k'和模式只允许大写的'[kM]'。我相信我的问题仍然有一些优点，一些人，因为在这种情况下，find（）应该已经失败，并返回假，因为3fO不应该匹配“[0-9][kM]？O”。
无论如何，如果我改变到[KkMm]（它是阻力，所以'm'通常不同于'M'，而'K'与'k'的差异较小），那么它似乎正确地拉动了组。这就像组知道他们没有正确匹配，但find（）说YES！！无论如何。

赞(0）回复(0）举报 2023-02-07

我来回答

为什么Java的正则表达式模式/匹配器会错误地计算unicode字符串中的组位置

1条答案

相关问题

热门标签

最新问答