用于创建结构数组的配置单元选择语句

qlvxas9a  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(492)

我在选择配置单元中的结构数组时遇到问题。
我的源表如下所示:

  1. +-------------+--+
  2. | field |
  3. +-------------+--+
  4. | id |
  5. | fieldid |
  6. | fieldlabel |
  7. | fieldtype |
  8. | answer_id |
  9. | unitname |
  10. +-------------+--+

这是调查数据,其中id是调查id,中间的四个字段是响应数据,unitname是调查所属的业务单元。
我需要为每个调查id的所有答案创建一个结构数组。我原以为这会奏效,但实际上不行:

  1. select id,
  2. array(
  3. named_struct(
  4. "field_id",
  5. fieldid,
  6. "field_label",
  7. fieldlabel,
  8. "field_type",
  9. fieldtype,
  10. "answer_id",
  11. answer_id,)) as answers,
  12. unitname
  13. from new_answers;

返回的是每个调查答案(字段\ id)作为该答案的一个结构数组,如下所示:

  1. id | answers | unitname
  2. 1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
  3. 2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069}] | Location2
  4. 2 | [{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070}] | Location2
  5. 2 | [{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2

但我需要做的是:

  1. id | answers | unitname
  2. 1 | [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] | Location1
  3. 2 | [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},
  4. {"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},
  5. {"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] | Location2

我搜索了又搜索,但找到的所有答案似乎都与使用insert into….values()查询有关。我已经有了一个表结构;我就是不能让数组按它应该的方式排列。
任何帮助都将不胜感激。
出于复制目的,如果需要:

  1. CREATE TABLE `new_answers`(
  2. `id` bigint,
  3. `fieldid` bigint,
  4. `fieldlabel` string,
  5. `fieldtype` string,
  6. `answer_id` bigint,
  7. `unitname` string)
  8. INSERT INTO new_answers VALUES
  9. (1,175877,"Comment","COMMENT",8990947803,"Location1"),
  10. (2,47824,"Language","MULTIPLE_CHOICE",8990950069,"Location2"),
  11. (2,48187,"Language Type","MULTIPLE_CHOICE",8990950070,"Location2"),
  12. (2,47829,"Trans #","TEXT",8990950071,"Location2");
lg40wkob

lg40wkob1#

您似乎正在寻找的功能是将结构收集到一个数组中。hive提供了两个函数来收集数组中的内容:collect\u set和collect\u list。但是,这些函数只能用于创建基本类型的数组。
砖厂项目的jar(https://github.com/klout/brickhouse/wiki/downloads)提供了许多功能,包括收集复杂类型的功能。

  1. add jar hdfs://path/to/your/jars/brickhouse-0.6.0.jar

然后你可以添加 collect 使用您喜欢的任何名称的函数:

  1. create temporary function collect_struct as 'brickhouse.udf.collect.CollectUDAF';

以下查询:

  1. select id
  2. , collect_struct(
  3. named_struct(
  4. "field_id", fieldid,
  5. "field_label", fieldlabel,
  6. "field_type", fieldtype,
  7. "answer_id", answer_id)) as answers
  8. , unitname
  9. from new_answers
  10. group by id, unitname
  11. ;

提供以下结果:

  1. id answers unitname
  2. 1 [{"field_id":175877,"field_label":"Comment","field_type":"COMMENT","answer_id":8990947803}] Location1
  3. 2 [{"field_id":47824,"field_label":"Language","field_type":"MULTIPLE_CHOICE","answer_id":8990950069},{"field_id":48187,"field_label":"Language Type","field_type":"MULTIPLE_CHOICE","answer_id":8990950070},{"field_id":47829,"field_label":"Trans #","field_type":"TEXT","answer_id":8990950071}] Location2
展开查看全部

相关问题