如何在配置单元中使用regex解析apache日志时间戳?

7y4bm7vi  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(380)

我的日志文件记录如下所示:
107.344.154.200---[2005年8月23日:00:03:14-0400]“get/images/theimage.gif http/1.0”200 11401
我用这种语法来解析日志文件
创建表日志文件(
主机字符串,
标识字符串,
用户字符串,
时间字符串,
请求字符串,
状态字符串,大小字符串)行格式serde'org.apache.hadoop.hive.serde2.regexserde',带有serdeproperties(“input.regex”=“([^])([^])([^])(([^])([^[^]])([^\“]\”[^\“]\”)([^\“]\”)([0-9])([0-9]),
“output.format.string”=“%1$s%2$s%3$s%4$s%5$s%6$s%7$s”)存储为文本文件;
我可以使用什么正则表达式语法来解析时间,在这里它将按日-月-年-分-秒分割[23/aug/2005:00:03:14-0400]?

xe55xuns

xe55xuns1#

说明

此正则表达式将执行以下操作:
分析日志条目并查找日期和时间
捕获各种日期部分,如日、月、年、时、分、秒、utc偏移量
正则表达式

  1. \[(\d{2})/([a-zA-Z]{3})/(\d{4}):(\d{2}):(\d{2}):(\d{2})\s(-\d{4})]

注意,根据语言的不同,你可能不得不逃避 / 把它们换成 \/ . 但是每一种语言都是不同的。

解释

  1. NODE EXPLANATION
  2. ----------------------------------------------------------------------
  3. \[ '['
  4. ----------------------------------------------------------------------
  5. ( group and capture to \1:
  6. ----------------------------------------------------------------------
  7. \d{2} digits (0-9) (2 times)
  8. ----------------------------------------------------------------------
  9. ) end of \1
  10. ----------------------------------------------------------------------
  11. / '/'
  12. ----------------------------------------------------------------------
  13. ( group and capture to \2:
  14. ----------------------------------------------------------------------
  15. [a-zA-Z]{3} any character of: 'a' to 'z', 'A' to 'Z'
  16. (3 times)
  17. ----------------------------------------------------------------------
  18. ) end of \2
  19. ----------------------------------------------------------------------
  20. / '/'
  21. ----------------------------------------------------------------------
  22. ( group and capture to \3:
  23. ----------------------------------------------------------------------
  24. \d{4} digits (0-9) (4 times)
  25. ----------------------------------------------------------------------
  26. ) end of \3
  27. ----------------------------------------------------------------------
  28. : ':'
  29. ----------------------------------------------------------------------
  30. ( group and capture to \4:
  31. ----------------------------------------------------------------------
  32. \d{2} digits (0-9) (2 times)
  33. ----------------------------------------------------------------------
  34. ) end of \4
  35. ----------------------------------------------------------------------
  36. : ':'
  37. ----------------------------------------------------------------------
  38. ( group and capture to \5:
  39. ----------------------------------------------------------------------
  40. \d{2} digits (0-9) (2 times)
  41. ----------------------------------------------------------------------
  42. ) end of \5
  43. ----------------------------------------------------------------------
  44. : ':'
  45. ----------------------------------------------------------------------
  46. ( group and capture to \6:
  47. ----------------------------------------------------------------------
  48. \d{2} digits (0-9) (2 times)
  49. ----------------------------------------------------------------------
  50. ) end of \6
  51. ----------------------------------------------------------------------
  52. \s whitespace (\n, \r, \t, \f, and " ")
  53. ----------------------------------------------------------------------
  54. ( group and capture to \7:
  55. ----------------------------------------------------------------------
  56. - '-'
  57. ----------------------------------------------------------------------
  58. \d{4} digits (0-9) (4 times)
  59. ----------------------------------------------------------------------
  60. ) end of \7
  61. ----------------------------------------------------------------------
  62. ] ']'
  63. ----------------------------------------------------------------------

示例文本

  1. 107.344.154.200 - - [23/Aug/2005:00:03:14 -0400] "GET /images/theimage.gif HTTP/1.0" 200 11401

现场演示
https://regex101.com/r/hf4fp8/1
样本匹配

  1. [0][0] = [23/Aug/2005:00:03:14 -0400]
  2. [0][1] = 23
  3. [0][2] = Aug
  4. [0][3] = 2005
  5. [0][4] = 00
  6. [0][5] = 03
  7. [0][6] = 14
  8. [0][7] = -0400
展开查看全部

相关问题