我有一个 MAP
txt文件中的数据类型:
[age#27,height#5.8]
[age#25,height#5.3]
[age#27,height#5.10]
[age#25,height#5.1]
我想显示每个年龄组的平均身高。
这就是 LAOD
声明:
records = LOAD '~/Documents/Pig_Map.txt' AS (details:map[]);
records: {details: map[]}
然后我根据年龄对数据进行分组:
group_data = GROUP records BY details#'age';
group_data: {group: bytearray,records: {(details: map[])}}
用于访问 details
我做了一个 FLATTEN
像这样(不确定是否需要这一步):
flatten_records = FOREACH group_data GENERATE group,FLATTEN(records);
flatten_records: {group: bytearray,records::details: map[]}
``` `DUMP flatten_records` 这给了我以下输出:
(25,[height#5.1,age#25])
(25,[height#5.3,age#25])
(27,[height#5.10,age#27])
(27,[height#5.8,age#27])
现在我想得到平均身高;我试过这个:
display_records = FOREACH flatten_records GENERATE group,AVG(records.details#'height');
错误是:
<line 10, column 57> Multiple matching functions for org.apache.pig.builtin.AVG with input schema: ({{(bytearray)}}, {{(double)}}). Please use an explicit cast.
请给我建议。
1条答案
按热度按时间ojsjcaue1#
你能试试这个吗?
输出: