hivejson数据解析

7eumitmz  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(264)

我的json数据在json\表和列中是这样的:json\ col

{
    "href": "example.com",
    "Hosts": {
        "cluster_name": "test",
        "host_name": "test.iabc.com"
    },
    "metrics": {
        "cpu": {
            "cpu_user": [
                [
                    0.7,
                    1499795941
                ],
                [
                    0.3,
                    1499795951
                ]
            ]
        }
    }
}

我想用下面的格式把这个数据放到一个表中

+-------------+-------+------------+
| metric_type | value | timestamp  |
+-------------+-------+------------+
| cpu_user    | 0.7   | 1499795941 |
+-------------+-------+------------+
| cpu_user    | 0.3   | 1499795951 |
+-------------+-------+------------+

我尝试使用get\u json\u object获取值

select get_json_object(json_col,'$.metrics.cpu.cpu_user[1]') from json_table

,这给了我

[0.3,1499795951]

如何从这里使用explode函数来获得所需的输出?

nwo49xxi

nwo49xxi1#

您还可以基于json数据实现serde和inputformat接口,而不是使用udf。
以下是一些参考:
http://blog.cloudera.com/blog/2012/12/how-to-use-a-serde-in-apache-hive/httpshttp://github.com/xjtuzxh/inceptor-inputformat

bhmjp9jg

bhmjp9jg2#

select  'cpu_user'      as metric_type 
       ,val_ts[0]       as val
       ,val_ts[1]       as ts

from   (select  split(m.col,',') as val_ts

        from    json_table j
                lateral view explode(split(regexp_replace(get_json_object(json_col,'$.metrics.cpu.cpu_user[*]'),'^\\[\\[|\\]\\]$',''),'\\],\\[')) m
        ) m
;
+-------------+-----+------------+
| metric_type | val |     ts     |
+-------------+-----+------------+
| cpu_user    | 0.7 | 1499795941 |
| cpu_user    | 0.3 | 1499795951 |
+-------------+-----+------------+

相关问题