全新的Pig,只是试图创造一个由城市和赞助人工业部门的集合,我很难做到这一点。我收到了 ERROR 1066: Unable to open iterator for alias test
我的目标是根据每个城市的工人数量对工业部门进行排名。影响的东西 New York City: Finance 20, Accounting 15, Shoemaking 30
等。我错过了什么或做错了什么?
bus_data = LOAD 'sectorAnalysis.json' USING com.twitter.elephantbird.pig.load.JsonLoader('-nestedLoad') as (row: map[]);
bus_data_rows = FOREACH bus_data GENERATE (chararray) row# 'city' AS city, row# 'state' AS state, row# 'sectors' AS sectors, row# 'workers' AS workers;
flattened_bus = FOREACH bus_data_rows GENERATE city, state, FLATTEN(sectors) as sector, workers;
distinct_flat_bus = DISTINCT flattened_bus;
group_by_sec = GROUP distinct_flat_bus BY (city, sector);
sum_sec = FOREACH group_by_sec GENERATE flatten(group) AS (city, sector), SUM(workers) AS worker_T;
DUMP sum_sec;
数据格式:
(Brookville, NY, (product 1), 12)
(Tempe, AZ, (product 3), 13)
(Brookville, NY, (product 1), 9)
(Miami, FL, (Product 2), 10)
(Brookvile, NY, (product 2), 15)
预期的最终结果如下:
(Brookville, NY, (product 1), 21)
(Brookville, NY, (product 2), 15)
(Tempe, AZ, (product 3), 13)
(Miami, FL, (Product 2), 10)
暂无答案!
目前还没有任何答案,快来回答吧!