logstash 如果日志包含特定单词,则忽略并移至下一个模式

m528fe3b  于 2022-12-09  发布在  Logstash
关注(0)|答案(2)|浏览(180)

I have a log file which comes from spring log file. The log file has three formats. Each of the first two formats is a single line, between them if there is keyword app-info, it is the message printed by own developer. If no, it is printed by spring framework. We may treat developers message different from spring framework ones. The third format is a multiline stack trace.
We have an example for our own format, for example

2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO  - app-info - injectip ip 192.168.16.89

The above line has app-info key works, so it is our own developers'.

2018-04-27 10:42:23 [RMI TCP Connection(10)-127.0.0.1] - INFO  - org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring FrameworkServlet 'dispatcherServlet'

The above line has not app-info keyword, so it is printed by spring framework.
In my Grok filter, The first pattern is for messages printed from spring framework, the second is for developers' message, the third format is for multiline stacktrace. I want to first regex clearly mention that spring framework pattern does not have key word app-info so that it could get paserexception and follow the second pattern which is developers own format. So I have following formats in regex tool , but I got compile error. My regex is as follows:

(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s+(?<systemmsg>[^((?app-info).)*\s\.\w\-\'\:\d\[\]\/]+)

since in Grok filter, I use instruction from this link

filter {
   grok {
     match => [ "message", "PATTERN1", "PATTERN2" , "PATTERN3" ]
    }
}

My current configure in logstash is as follows which does not mention app-info clearly in the pattern:

filter {
  grok {
    match => [
      "message",
        '(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s+(?<systemmsg>[\s\.\w\-\'\:\d\[\]\/^[app-info]]+)',
        '(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s(?<appinfo>app-info)\s-\s(?<systemmsg>[\w\d\:\{\}\,\-\(\)\s\"]+)',
        '(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\w\-\d]+)\]\s-\s(?<loglevel>[\w]+)\s\-\s(?<appinfo>app-info)\s-\s(?<params>params):(?<jsonstr>[\"\w\d\,\:\.\{\}]+)\s(?<exceptionname>[\w\d\.]+Exception):\s(?<exceptiondetail>[\w\d\.]+)\n\t(?<extralines>at[\s\w\.\d\~\?\n\t\(\)\_\[\]\/\:\-]+)\n\d'
      
    ]      
  }

}

With the format in above logstash configuration, when handling with

2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO  - app-info - injectip ip 192.168.16.89

The first pattern(spring framework pattern) already works, so it does not fall into second pattern which is our own developers format. The parser has parsered successfully as follows:

{
  "timestamp": [
    [
      "2018-04-27 10:42:49"
    ]
  ],
  "threadname": [
    [
      "http-nio-8088-exec-1"
    ]
  ],
  "loglevel": [
    [
      "INFO"
    ]
  ],
  "systemmsg": [
    [
      "app-info - injectip ip 192.168.16.89\n\n"
    ]
  ]
}

Any hints I could let first pattern clearly mention that systemmsg shall not contain key word "app-info"?

EDIT:

My goal is that if there is no key word app-info, I let pattern 1 to handle the log. If there is key word app-info, I let pattern 2 to handle the log.
With following log which does not contains key word app-info (pattern 1 shall works),

2018-04-27 10:42:23 [RMI TCP Connection(10)-127.0.0.1] - INFO  - org.apache.catalina.core.ContainerBase.[Tomcat].[localhost].[/] - Initializing Spring FrameworkServlet 'dispatcherServlet'

I got following result no match with first pattern modified following your suggestion, which is not my goal.

(?<timestamp>[\d\-\s\:]+)\s\[(?<threadname>[\d\.\w\s\(\)\-]+)\]\s-\s(?<loglevel>[\w]+)\s+-\s+(?<systemmsg>[^(?:(?!app\-info).)*\s\.\w\-\'\:\d\[\]\/]+)

see demo . My goal is to extract timestamp, thread name, log level and system msg. But first pattern does not give me the expected result. The tool say there is no match.
if I remove ^(?:(?!app-info).)*, then above log(without key word app-info) parser works. See demo But now, It also works for log which contains key word app-info which is not expected, since now I want to extract timestamp, threadname, loglevel,app-info(exist or not)(the field shall be extracted or grouped), then systemmsg. The expectation is that the first parser returns error, let second parser to handle the log. demo could see the parser also works for log with key word app-info. Systemmsg put field app-info into its value which is not expected.
So I want pattern 1, handles log without keyword app-info, pattern 2 handles log with keyword app-info. So I clearly let pattern 1 throw parse error or exception when it contains key word app-info.

nbysray5

nbysray51#

我的目标是让模式1处理没有关键字app-info的日志。如果有app-info,第一个模式将抛出解析错误,以便第二个解析器可以处理日志。
可以使用以下模式作为第一个模式,

(?<data>^(?!.*app-info).*)%{LOGLEVEL:log}%{DATA:other_data}%{IP:ip}$

如果日志中的任何位置存在app-info,它将忽略该日志,并移动到2nd PATTERN

示例

日志中没有app-info

2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO  injectip ip 192.168.16.89

您可以根据自己的要求进行过滤。

    • 输出**
{
  "data": [
    [
      "2018-04-27 10:42:49 [http-nio-8088-exec-1] - "
    ]
  ],
  "log": [
    [
      "INFO"
    ]
  ],
  "other_data": [
    [
      "  injectip ip "
    ]
  ],
  "ip": [
    [
      "192.168.16.89"
    ]
  ]
}

现在使用app-info进行日志记录,

2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO app-info  injectip ip 192.168.16.89
    • 输出**
No Matches

    • 编辑2:**

如果让PATTERN1等于(?<data>^(?!.*app-info).*)
你会得到,

{
  "data": [
    [
      "2018-04-27 10:42:49 [http-nio-8088-exec-1] - INFO  injectip ip 192.168.16.89"
    ]
  ]
}

然后可以为data字段添加第二个grok筛选器,

grok {
  match => {"data" => "DEFINE PATTERN HERE"}
}
3pvhb19x

3pvhb19x2#

我使用了GREEDYDATA,假设您有以下日志行
重定向控制器:点击数据重定向成功:{a:123,b:345}
并且您希望捕获到"data",则按如下所示使用GREEDYDATA
% {GREEDYDATA}数据:% {SPACE} % {模式的其余部分}

相关问题