使用regex提取字段分隔的子字符串

uwopmtnx  于 2021-06-03  发布在  Hadoop
关注(0)|答案(4)|浏览(386)

如何使用regex从syslog消息中提取programname?我有一个java流处理模块,它接受regex来处理syslog消息。
日志行可以是:

2013-10-14T22:05:29+00:00 hostname sshd[6359]: Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname sshd:3322 Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname sshd/6359 Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname sshd Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname SSHD[1133] Connection closed by 192.168.1.10
2013-10-14T22:05:29+00:00 hostname SSH.D[6359]: Connection closed by 192.168.1.10

字符串提取过程应该是:取空格分隔的第三个子字符串,并提取以空格结尾的子字符串 [ , : , / 或空间
所以在前四个日志样本中,提取的字符串 sshd ,第五个 SSHD 第六 SSH.D . 用正则表达式可以吗?
编辑:
我试过的是 ((?:[A-Za-z][A-Za-z0-9_.-]+)) 而且它似乎工作,但老实说,我修改了一个例子regex和使用一个在线工具来调整它,直到它适合我的用例,但我不知道它到底是如何工作的。

yhived7q

yhived7q1#

双倍 split 应该做的工作:

String token = data.split(" +")[2].split("[\\[:/]")[0];
zlwx9yxi

zlwx9yxi2#

我想你要找的正则表达式是:

String regex = "([^\\[:/]+).*";
``` `.*` 表示匹配任何字符的0个或多个。在点星前面放一对圆括号 `().*` 创建可以从匹配器中选择的组。因为它是第一组括号,所以它被组1引用。括号内有一个表达式,它匹配一个或多个被求反的字符类 `[^]+` 包含op中指定的字符,特别是“[”、“:”和“/”字符。
下面是一个测试结果的应用程序示例:

package com.stackexchange.stackoverflow;

import java.util.ArrayList;
import java.util.List;
import java.util.regex.Matcher;
import java.util.regex.Pattern;

public class Question19370191 {
public static void main(String[] args) {
String regex = "([^\[:/]+).*";
Pattern pattern = Pattern.compile(regex);

    List<String> lines = new ArrayList<>();
    lines.add("2013-10-14T22:05:29+00:00 hostname sshd[6359]: Connection closed by 192.168.1.10");
    lines.add("2013-10-14T22:05:29+00:00 hostname sshd:3322 Connection closed by 192.168.1.10");
    lines.add("2013-10-14T22:05:29+00:00 hostname sshd/6359 Connection closed by 192.168.1.10");
    lines.add("2013-10-14T22:05:29+00:00 hostname sshd Connection closed by 192.168.1.10");
    lines.add("2013-10-14T22:05:29+00:00 hostname SSHD[1133] Connection closed by 192.168.1.10");
    lines.add("2013-10-14T22:05:29+00:00 hostname SSH.D[6359]: Connection closed by 192.168.1.10");

    for(String line : lines) {
        String field = line.split("\\s")[2];
        String extraction = "";
        Matcher matcher = pattern.matcher(field);
        if(matcher.matches()) {
            extraction = matcher.group(1);
        }

        System.out.println(String.format("Field \"%-12s\" Extraction \"%s\"", field, extraction));
    }
}

}

它输出以下内容:

Field "sshd[6359]: " Extraction "sshd"
Field "sshd:3322 " Extraction "sshd"
Field "sshd/6359 " Extraction "sshd"
Field "sshd " Extraction "sshd"
Field "SSHD[1133] " Extraction "SSHD"
Field "SSH.D[6359]:" Extraction "SSH.D"

fxnxkyjh

fxnxkyjh3#

如果您的示例数据与您提供的完全相同:

(?:.+?\s){2}([\w\.]+).+$

解释: (?:.+?\s){2} …匹配到第二个空格 ([^\s[:/]+) …匹配任何不是“”、“:”或“/”的项 .+$ …匹配eol
你想要的东西将在捕获组中 \1

gupuwyp2

gupuwyp24#

尝试以下操作:

String str = line.split(" ")[2].replaceAll("(.+)(\\[|\\:|\\/).+", "$1");

还没测试过。

相关问题