如何在spark SQL中从JSON列中提取动态数量的键值对

mbskvtky  于 2024-01-09  发布在  Spark
关注(0)|答案(1)|浏览(260)

我试图从我的JSON列中解析出test-N-value。我已经尝试过使用JavaScript,但创建的列在这里返回NULL。

  1. SELECT get_json_object('{"random_field": "ABC", "test-0-value":"1", "test-1-value":"2"}', '$.test-*-value');

字符串
有没有人有任何提示,如何从中提取出这些动态对象?预期的结果将是..但请记住,输出必须允许动态数量的键值对。

  1. extracted_out
  2. -------------
  3. {"test-0-value":"1", "test-1-value":"2"}

jjjwad0x

jjjwad0x1#

您可以使用from_json将json转换为map,分解map以获取键,值,仅过滤您需要的键,
收集key:value数组,将数组与逗号连接,使用str_to_map获取包含所需键的map,然后您可以选择使用to_json将其转换为JSON字符串
演示:

  1. with src as (select '{"random_field": "ABC", "test-0-value":"1", "test-1-value":"2"}' as original_json)
  2. select s.original_json, to_json( str_to_map(concat_ws(',',collect_list(concat(key,':',value))))) result
  3. from src s
  4. lateral view explode(from_json(s.original_json, 'map<STRING, STRING>')) m as key, value
  5. where key rlike 'test-\\d+-value'
  6. group by s.original_json
  7. """).show(100, false)

字符串
测试结果:

  1. +---------------------------------------------------------------+---------------------------------------+
  2. |original_json |result |
  3. +---------------------------------------------------------------+---------------------------------------+
  4. |{"random_field": "ABC", "test-0-value":"1", "test-1-value":"2"}|{"test-0-value":"1","test-1-value":"2"}|
  5. +---------------------------------------------------------------+---------------------------------------+


更新:
使用map_filter找到更好的解决方案

  1. spark.sql("""
  2. with src as (select '{"random_field": "ABC", "test-0-value":"1", "test-1-value":"2"}' as original_json)
  3. select s.original_json, to_json( map_filter(from_json(s.original_json, 'map<STRING, STRING>'), (k, v) -> k rlike 'test-\\d+-value')) result
  4. from src s
  5. """).show(100, false)

展开查看全部

相关问题