apache pig group-by,order-by

ct2axkht  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(340)

我有一个装有元组的袋子,里面有playername,gamename,score。我先把上面的一个包一个游戏一组,然后把它放在另一个包里。现在我想把每个游戏得分最高的元组放在另一个包里。我该怎么做?

mspsb9vt

mspsb9vt1#

输入:

jon,mario,2345
joe,minesweeper,234
peter,mario,112
lisa,minesweeper,900

Pig脚本:

game_data = LOAD 'game_data.csv'  USING  PigStorage(',') AS (player:chararray, game:chararray,  score:long);
game_data_grp_by_game = GROUP game_data BY game;
game_kpis = FOREACH game_data_grp_by_game {
 ord_game_data_by_score = ORDER game_data BY score DESC;
 max_score_record = LIMIT ord_game_data_by_score 1;
 GENERATE group AS game, FLATTEN(max_score_record.player) AS player_name, FLATTEN(max_score_record.score) AS score; 
};

输出:转储游戏\u KPI:

(mario,jon,2345)
(minesweeper,lisa,900)

相关问题