配置单元表中的sql解析字符串

gz5pxeao  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(290)

我有一个配置单元表,它有两个列(day,type of theu day),都是string类型

"monday"    [{"temp" : 45, "weather": "rainny"}, {"temp" : 25, "weather": "sunny"}, {"temp" : 15, "weather": "storm"}]
"tuesday"   [{"temp" : 5, "weather": "winter"}, {"temp" : 10, "weather": "sun"}, {"temp" : 18, "weather": "dawn"}]

我想分开(我猜爆炸是专业术语),然后得到每天的天气列表。我很熟悉如何在python中实现这一点,但是有没有一种方法可以直接在hive中实现这一点。

"monday"    [45, 25, 15]
"tuesday"   [5, 10, 18]
2ul0zpep

2ul0zpep1#

使用数据示例进行测试。将cte替换为您的表。阅读代码中的注解:

with your_table as (--use your table instead of this CTE
select stack(2,
             "monday",'[{"temp" : 45, "weather": "rainny"}, {"temp" : 25, "weather": "sunny"}, {"temp" : 15, "weather": "storm"}]',
             "tuesday" ,'[{"temp" : 5, "weather": "winter"}, {"temp" : 10, "weather": "sun"}, {"temp" : 18, "weather": "dawn"}]'
            )as (day, type_of_day)
) --use your table instead of this CTE

select s.day, array(get_json_object(type_of_day_array[0],'$.temp'),
                    get_json_object(type_of_day_array[1],'$.temp'),
                    get_json_object(type_of_day_array[2],'$.temp')
                   ) as result_array --extract JSON elements and construct array
 from
(
select day,  split(regexp_replace(regexp_replace(type_of_day,'\\[|\\]',''),   --remove square brackets
                                  '\\}, *\\{','\\}##\\{'),                    --make convenient split separator
                   '##')                                                      --split                                  
          as type_of_day_array
  from your_table                                                             --use your table instead of this CTE
)s;

结果:

s.day   result_array    

monday  ["45","25","15"]    
tuesday ["5","10","18"]

如果json数组可以包含三个以上的元素,那么可以使用横向视图explode或posexplode,然后构建结果数组,如下所示:https://stackoverflow.com/a/51570035/2700344.
在cast()中 Package 数组元素。。。如果你需要的话 array<int> 结果不是 array<string> :

cast(get_json_object(type_of_day[0],'$.temp') as int)...

相关问题