javaregex根据每个id的循环子字符串提取id字符串

y0u0uwnf 于 2021-07-09 发布在 Java

关注(0)|答案(3)|浏览(407)

我正在读取一个日志文件并提取文件中包含的某些数据。我能够提取日志文件每行的时间。
现在我要提取身份证 "ieatrcxb4498-1" . 所有的id都是从子字符串开始的 ieatrcxb 我试图查询并返回基于它的完整字符串。
我试过许多不同于其他帖子的建议。但我失败了，有以下几种模式：

(?i)\\b("ieatrcxb"(?:.+?)?)\\b
(?i)\\b\\w*"ieatrcxb"\\w*\\b"
^.*ieatrcxb.*$

我还尝试根据以下字符串提取完整的id i 并在 1 . 就像他们一样。
日志文件的行

150: 2017-06-14 18:02:21 INFO  monitorinfo           :     Info: Lock VCS on node "ieatrcxb4498-1"

代码

Scanner s = new Scanner(new FileReader(new File("lock-unlock.txt")));
    //Record currentRecord = null;
    ArrayList<Record> list = new ArrayList<>();

    while (s.hasNextLine()) {
        String line = s.nextLine();

        Record newRec = new Record();
        // newRec.time =
        newRec.time = regexChecker("([0-1]?\\d|2[0-3]):([0-5]?\\d):([0-5]?\\d)", line);

        newRec.ID = regexChecker("^.*ieatrcxb.*$", line);

        list.add(newRec);

    }

public static String regexChecker(String regEx, String str2Check) {

    Pattern checkRegex = Pattern.compile(regEx);
    Matcher regexMatcher = checkRegex.matcher(str2Check);
    String regMat = "";
    while(regexMatcher.find()){
        if(regexMatcher.group().length() !=0)
            regMat = regexMatcher.group();
        }
        //System.out.println("Inside the "+ regexMatcher.group().trim());
    }

     return regMat;
}

我需要一个简单的模式，这将为我做到这一点。

Java regex pattern-matching Matcher

来源：https://stackoverflow.com/questions/45106923/java-regex-to-extract-an-id-string-based-on-recurring-sub-string-of-each-id

3条答案

按热度按时间

drnojrws1#

id是否总是有格式“ ieatrcxb 后跟4位数字，后跟 - ，后跟1位“？
如果是这样，你可以：

regexChecker("ieatrcxb\\d{4}-\\d", line);

注意 {4} 量词，正好匹配4个数字( \\d ). 如果最后一个数字总是 1 ，您也可以使用 "ieatrcxb\\d{4}-1" .
如果位数不同，可以使用 "ieatrcxb\\d+-\\d+" ，在哪里 + 表示“1个或更多”。
您也可以使用 {} 出现次数最少和最多的量词。例子： "ieatrcxb\\d{4,6}-\\d" - {4,6} 意思是“最少4次，最多6次”（这只是一个例子，我不知道这是不是你的情况）。如果您确切地知道id可以有多少个数字，这将非常有用。
以上所有的工作都是为了你的案子 ieatrcxb4498-1 . 使用哪一种取决于您的输入如何变化。
如果你只需要数字而不需要 ieatrcxb 零件( 4498-1 )，可以使用lookbehind regex：

regexChecker("(?<=ieatrcxb)\\d{4,6}-\\d", line);

这使得 ieatrcxb 不是比赛的一部分，因此回来只是 4498-1 .
如果你也不想 -1 只是 4498 ，您可以将此与展望相结合：

regexChecker("(?<=ieatrcxb)\\d{4,6}(?=-\\d)", line)

这只会回来 4498 .

赞(0）回复(0）举报 2021-07-09

41zrol4v2#

你想用非常困难的方法来做这件事。如果 lock-unlock.txt 文件与您发布的代码段相同，您可以执行以下操作：

File logFile = new File("lock-unlock.txt");

List<String> lines = Files.readAllLines(logFile.toPath());

List<Integer> ids = lines.stream()
                .filter(line -> line.contains("ieatrcxb"))
                .map(line -> line.split( "\"")[1]) //"ieatrcxb4498-1"
                .map(line -> line.replaceAll("\\D+","")) //"44981"
                .map(Integer::parseInt) // 44981
                .collect( Collectors.toList() );

如果你不只是寻找身份证号码，只需删除/评论第二和第三 .map() 方法调用，但它将导致字符串列表而不是整数。

赞(0）回复(0）举报 2021-07-09

ubbxdtey3#

public static void main(String[] args) {
    String line = "150: 2017-06-14 18:02:21 INFO  monitorinfo           :     Info: Lock VCS on node \"ieatrcxb4498-1\"";
    String regex ="ieatrcxb.*1";
    Pattern p = Pattern.compile(regex);
    Matcher m = p.matcher(line);
    while(m.find()){
        System.out.println(m.group());
    }
}

或者如果id都被引用了：

String id = line.substring(line.indexOf("\""), line.lastIndexOf("\"")+1);
 System.out.println(id);

赞(0）回复(0）举报 2021-07-09

我来回答

javaregex根据每个id的循环子字符串提取id字符串

3条答案

相关问题

热门标签

最新问答