Logstash无法解析nginx的user_agent字段

ff29svar  于 2022-12-16  发布在  Logstash
关注(0)|答案(1)|浏览(175)

我有以下格式的nginx日志:

192.168.0.1 - - [18/Jul/2022:11:20:28 +0000] "GET / HTTP/1.1" 200 15 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-"
192.168.128.1 - - [18/Jul/2022:13:22:15 +0000] "GET / HTTP/1.1" 200 615 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-"

我使用下面的管道来解析它们,并将它们存储到elasticsearch中:

input {
    beats {
        port => 5044
    }
}

filter {
    grok {
        match => [ "message" , "%{COMBINEDAPACHELOG}+%{GREEDYDATA:extra_fields}"]
    }
    mutate {
        convert => ["response", "integer"]
        convert => ["bytes", "integer"]
        convert => ["responsetime", "float"]
    }
    geoip {
        source => "clientip"
        target => "geoip"
        add_tag => [ "nginx-geoip" ]
    }
    date {
        match => [ "timestamp" , "dd/MMM/YYYY:HH:mm:ss Z" ]
    }
    useragent {
        source => "agent"
    }
}

output {
    elasticsearch {
        hosts => ["http://elasticsearch:9200"]
        index => "weblogs-%{+YYYY.MM.dd}"
        document_type => "nginx_logs"
        user => "elastic"
        password => "changeme"
    }
    stdout { codec => rubydebug }
}

但是,似乎useragent的部分不起作用,因为我看不到它:

{

    "httpversion" => "1.1",

       "clientip" => "192.168.0.1",

          "ident" => "-",

      "timestamp" => "18/Jul/2022:11:20:28 +0000",

           "verb" => "GET",

     "@timestamp" => 2022-07-18T11:20:28.000Z,

       "@version" => "1",

           "tags" => [

        [0] "beats_input_codec_plain_applied",

        [1] "_geoip_lookup_failure"

    ],

           "host" => {

        "name" => "9a852bd136fd"

    },

           "auth" => "-",

          "bytes" => 15,

       "referrer" => "\"-\"",

          "geoip" => {},

        "message" => "192.168.0.1 - - [18/Jul/2022:11:20:28 +0000] \"GET / HTTP/1.1\" 200 15 \"-\" \"Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36\" \"-\"",

       "response" => 200,

          "agent" => {

             "version" => "7.3.2",

        "ephemeral_id" => "0c38336d-1e30-4aaa-9ba8-20bd7bd8fb48",

                "type" => "filebeat",

            "hostname" => "9a852bd136fd",

                  "id" => "8991142a-95df-4aed-a190-bda4649c04cd"

    },

          "input" => {

        "type" => "log"

    },

        "request" => "/",

     "extra_fields" => " \"-\"",

            "log" => {

          "file" => {

            "path" => "/var/log/nginx/access.log"

        },

        "offset" => 11021

    },

            "ecs" => {

        "version" => "1.0.1"

    }

}

我需要的是一个包含整个http_user_agent内容的字段。知道是什么导致了这个错误吗?

xqnpmsa8

xqnpmsa81#

我对logstash不是很熟悉,但是我认为您可能需要提供一个正则表达式模式来解析日志。
例如,当我使用rquery检索日志时,我将提供如下所示的解析模式和过滤条件。

[ rquery]$ cat nginx.log
192.168.0.1 - - [18/Jul/2022:11:20:28 +0000] "GET / HTTP/1.1" 200 15 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-"
192.168.128.1 - - [18/Jul/2022:13:22:15 +0000] "GET / HTTP/1.1" 200 615 "-" "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36" "-"

[ rquery]$ ./rq -q "parse /(?P<host>\S+) (\S+) (?P<user>\S+) \[(?P<time>[^\n]+)\] \\\"(?P<request>[^\n]*)\\\" (?P<status>[0-9]+) (?P<size>\S+) \\\"(?P<referrer>[^\n]*)\\\" \\\"(?P<agent>[^\n]*)\\\" \\\"(?P<others>[^\n]*)\\\"/| select agent | filter agent like '*Chrome*'" nginx.log
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36
Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/103.0.0.0 Safari/537.36

相关问题