使用dataset从配置单元中的字符串中提取json字段

yuvru6vn  于 2021-06-24  发布在  Hive
关注(0)|答案(1)|浏览(565)

我正在尝试一个非常基本的Hive查询。我试图从数据集中提取一个json字段,但我总是
\不
但是对于json字段,可以使用一些\u字符串
我的问题是:

  1. WITH dataset AS (
  2. SELECT
  3. CAST(
  4. '{ "traceId": "abc", "additionalData": "{\"Star Rating\":\"3\"}", "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
  5. }' AS STRING) AS some_string
  6. )
  7. SELECT some_string, get_json_object(dataset.some_string, '$.traceId') FROM dataset

问:如何在这里获取json字段?

hfwmuf9z

hfwmuf9z1#

问题出在反斜杠上。单个反斜杠被视为“的转义字符,并由配置单元删除:

  1. hive> select '\"';
  2. OK
  3. "
  4. Time taken: 0.069 seconds, Fetched: 1 row(s)

当您有两个反斜杠时,hive会删除一个:

  1. hive> select '\\"';
  2. OK
  3. \"
  4. Time taken: 0.061 seconds, Fetched: 1 row(s)

有两个反斜杠,效果很好:

  1. WITH dataset AS (
  2. SELECT
  3. CAST(
  4. '{ "traceId": "abc", "additionalData": "{\\"Star Rating\\":\\"3\\"}", "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
  5. }' AS STRING) AS some_string
  6. )
  7. SELECT some_string, get_json_object(dataset.some_string, '$.traceId') FROM dataset;
  8. OK
  9. { "traceId": "abc", "additionalData": "{\"Star Rating\":\"3\"}", "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
  10. } abc
  11. Time taken: 0.788 seconds, Fetched: 1 row(s)

您还可以轻松地删除additionaldata中{和}之后的双引号:

  1. WITH dataset AS (
  2. SELECT
  3. regexp_replace(regexp_replace(
  4. '{ "traceId": "abc", "additionalData": "{\"Star Rating\":\"3\"}", "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
  5. }' ,'\\"\\{','\\{') ,'\\}\\"','\\}' )AS some_string
  6. )
  7. SELECT some_string, get_json_object(dataset.some_string, '$.traceId') FROM dataset;

退货:

  1. OK
  2. { "traceId": "abc", "additionalData": {"Star Rating":"3"}, "locale": "en_US", "content": { "contentType": "PB", "content": "T S", "bP": { "mD": { "S R": "3" }, "cType": "T_S", "sType": "unknown-s", "bTimestamp": 0, "title": "T S" } }
  3. } abc
  4. Time taken: 7.035 seconds, Fetched: 1 row(s)
展开查看全部

相关问题