如何在配置单元中使用regex解析apache日志时间戳？

我的日志文件记录如下所示：
107.344.154.200---[2005年8月23日：00:03:14-0400]“get/images/theimage.gif http/1.0”200 11401
我用这种语法来解析日志文件
创建表日志文件(
主机字符串，
标识字符串，
用户字符串，
时间字符串，
请求字符串，
状态字符串，大小字符串）行格式serde'org.apache.hadoop.hive.serde2.regexserde'，带有serdeproperties（“input.regex”=“（[^]）（[^]）（[^]）（（[^]）（[^[^]]）（[^\“]\”[^\“]\”）（[^\“]\”）（[0-9]）（[0-9]），
“output.format.string”=“%1$s%2$s%3$s%4$s%5$s%6$s%7$s”）存储为文本文件；
我可以使用什么正则表达式语法来解析时间，在这里它将按日-月-年-分-秒分割[23/aug/2005:00:03:14-0400]？

说明

此正则表达式将执行以下操作：
分析日志条目并查找日期和时间
捕获各种日期部分，如日、月、年、时、分、秒、utc偏移量
正则表达式

\[(\d{2})/([a-zA-Z]{3})/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s(-\d{4})]

注意，根据语言的不同，你可能不得不逃避 / 把它们换成 \/ . 但是每一种语言都是不同的。

解释

NODE                     EXPLANATION
----------------------------------------------------------------------
  \[                       '['
----------------------------------------------------------------------
  (                        group and capture to \1:
----------------------------------------------------------------------
    \d{2}                    digits (0-9) (2 times)
----------------------------------------------------------------------
  )                        end of \1
----------------------------------------------------------------------
  /                        '/'
----------------------------------------------------------------------
  (                        group and capture to \2:
----------------------------------------------------------------------
    [a-zA-Z]{3}              any character of: 'a' to 'z', 'A' to 'Z'
                             (3 times)
----------------------------------------------------------------------
  )                        end of \2
----------------------------------------------------------------------
  /                        '/'
----------------------------------------------------------------------
  (                        group and capture to \3:
----------------------------------------------------------------------
    \d{4}                    digits (0-9) (4 times)
----------------------------------------------------------------------
  )                        end of \3
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  (                        group and capture to \4:
----------------------------------------------------------------------
    \d{2}                    digits (0-9) (2 times)
----------------------------------------------------------------------
  )                        end of \4
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  (                        group and capture to \5:
----------------------------------------------------------------------
    \d{2}                    digits (0-9) (2 times)
----------------------------------------------------------------------
  )                        end of \5
----------------------------------------------------------------------
  :                        ':'
----------------------------------------------------------------------
  (                        group and capture to \6:
----------------------------------------------------------------------
    \d{2}                    digits (0-9) (2 times)
----------------------------------------------------------------------
  )                        end of \6
----------------------------------------------------------------------
  \s                       whitespace (\n, \r, \t, \f, and " ")
----------------------------------------------------------------------
  (                        group and capture to \7:
----------------------------------------------------------------------
    -                        '-'
----------------------------------------------------------------------
    \d{4}                    digits (0-9) (4 times)
----------------------------------------------------------------------
  )                        end of \7
----------------------------------------------------------------------
  ]                        ']'
----------------------------------------------------------------------

示例文本

107.344.154.200 - - [23/Aug/2005:00:03:14 -0400] "GET /images/theimage.gif HTTP/1.0" 200 11401

现场演示
https://regex101.com/r/hf4fp8/1
样本匹配

[0][0] = [23/Aug/2005:00:03:14 -0400]
[0][1] = 23
[0][2] = Aug
[0][3] = 2005
[0][4] = 00
[0][5] = 03
[0][6] = 14
[0][7] = -0400

展开查看全部

如何在配置单元中使用regex解析apache日志时间戳？

1条答案

说明

解释

相关问题

热门标签

最新问答