解析日志文件中的字符串时遇到问题,情况如下:
"skey":"110","scp_id":"OC05","capedge":"3G"
"skey":"140","scp_id":"OC02","capedge":"3G"
"skey":"0","scp_id":"OC01","capedge":"3G"
这是我们表的预期输出
| skey | scp_id | capedge |
| 110 | OC05 | 3G |
| 140 | OC02 | 3G |
| 0 | OC01 | 3G |
我试过从https://cwiki.apache.org/confluence/display/hive/languagemanual+udf 但不幸的是,我们的字符串不是url格式,有没有更好的方法?或者我必须使用regexp\u提取吗?
谢谢你,加利
1条答案
按热度按时间b1zrtrql1#
你可以使用
SPLIT
功能和REGEXP_EXTRACT
```select REGEXP_EXTRACT( skey , ':"(\w+)"', 1) as skey,
REGEXP_EXTRACT( scp_id , ':"(\w+)"', 1) as scp_id,
REGEXP_EXTRACT( capedge , ':"(\w+)"', 1) as capedge
from (
select SPLIT(log_record, ',' )[0] as skey,
SPLIT(log_record , ',')[1] as scp_id,
SPLIT( log_record , ',')[2] as capedge
FROM yourtable
) a;