piglatin limit and flatten产生错误的结果

b5lpy0ml  于 2021-06-24  发布在  Pig
关注(0)|答案(1)|浏览(398)
B = GROUP A BY state;
C = FOREACH B {                          
   DA = ORDER A BY population DESC;                
   DB = LIMIT DA 5;                         
   GENERATE FLATTEN(group), FLATTEN(DB.name), FLATTEN(DB.population);
}

问题是我得到的城市名称是5次而不是1次。我得到的结果是:

(ALASKA,M,27257)
(ALASKA,M,23696)
(ALASKA,M,19949)
(ALASKA,M,19926)
(ALASKA,M,19833)
(ALASKA,H,27257)
(ALASKA,H,23696)
(ALASKA,H,19949)
(ALASKA,H,19926)
(ALASKA,H,19833)

我需要的结果是:

(ALASKA,M,27257)
(ALASKA,H,23696)
bqf10yzr

bqf10yzr1#

2展平:展平(db.name),展平(db.population);造成卡特里安产品在两袋之间,用一袋替换

B = GROUP A BY state;
C = FOREACH B {                          
   DA = ORDER A BY population DESC;                
   DB = LIMIT DA 5;                         
   GENERATE FLATTEN(group), FLATTEN(DB.(name, population));
}

或者,由于group by创建的包包含所有原始元组和所有列,因此可以执行以下操作:

B = GROUP A BY state;
C = FOREACH B {                          
   DA = ORDER A BY population DESC;                
   DB = LIMIT DA 5;                         
   GENERATE FLATTEN(DB);
}

相关问题