具有regexserde属性的配置单元无法正常工作

zphenhs4  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(371)

我使用regex101网站验证了我的regex:

([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) (\d+) "(.*?)" "(.*?)" "(.*?)" "(.*?)"

它对下面的原木很有效

66.240.70.141 - - [01/Mar/2018:06:16:46 +0000] "GET /example.download.handler.com/products/01/00/item/116314/8/002394857_2BB.jpg HTTP/1.1" 200 41710 "-" "Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB30P) AppleWebKit/536.37 (KHTML, like Gecko) Chrome/41.0.2272.96 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)" "-" "C0T1_19610|3881001|"

但同样的表达在Hive上不起作用:

CREATE EXTERNAL TABLE `web_logs_test`(   
`ip_address`  string COMMENT '',   
`date_string` string COMMENT '',   
`request`     string COMMENT '', 
`status`      string COMMENT '',   
`bytes`       string COMMENT '',   
`referer`     string COMMENT '',   
`user_agent`  string COMMENT '',   
`cookie`      string COMMENT ''
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe' 
WITH SERDEPROPERTIES (  
'input.regex'='([(\d\.)]+) - - \[(.*?)\] "(.*?)" (\d+) (\d+) "(.*?)" "(.*?)" "(.*?)" "(.*?)"'
)
STORED AS 
INPUTFORMAT 'org.apache.hadoop.mapred.TextInputFormat' 
OUTPUTFORMAT 'org.apache.hadoop.hive.ql.io.HiveIgnoreKeyTextOutputFormat'
LOCATION '/weblogs/data'

如果有人知道,请帮帮我。
提前谢谢。

utugiqy6

utugiqy61#

CREATE EXTERNAL TABLE web_logs (
  ip_address STRING,
  date_string STRING,
  request STRING,
  status STRING,
  bytes STRING,
  referer STRING,
  user_agent STRING,
  cookie STRING
)
ROW FORMAT SERDE 'org.apache.hadoop.hive.serde2.RegexSerDe'
WITH SERDEPROPERTIES (
   "input.regex" = "^([\\d.]+) \\S+ \\S+ \\[(.+?)\\] \\\"(.+?)\\\" (\\d{3}) (\\d+) \\\"(.+?)\\\" \\\"(.+?)\\\" \\\"SESSIONID=(\\d+)\\\"\\s*"
)
LOCATION '/file_location/web_logs';

相关问题