使用apache pig从包中获取第一个和最后一个元组

dba5bblo  于 2021-06-24  发布在  Pig
关注(0)|答案(1)|浏览(328)

我是新的Pig拉丁语,我尝试下面的例子与Pig内置函数。

A = LOAD 'student.txt' AS (name:chararray, term:chararray, gpa:float);

B = GROUP A BY name;

DUMP B;

(John,{(John,sm,3.8),(John,sp,4.0),(John,wt,3.7),(John,fl ,3.9)})

(Mary,{(Mary,sm,4.0),(Mary,sp,4.0),(Mary,wt,3.9),(Mary,fl,3.8)})

我需要检索第一个元素=> (John,sm,3.8) 最后一个元素=> (John,fl ,3.9) 从袋子里。
需要帮助来解决没有使用自定义项。

kcugc4gi

kcugc4gi1#

好 啊。。你可以用这个方法。。但有点冗长。

names = LOAD '/user/user/inputfiles/names.txt' USING PigStorage(',') AS(name:chararray,term:chararray,gpa:float);

names_rank = RANK names;

names_each = FOREACH names_rank GENERATE $0 as row_id,name,term,gpa;

names_grp = GROUP names_each BY name;

names_first_each = FOREACH names_grp 
                            {
                              order_asc = ORDER names_each BY row_id ASC;
                              first_rec = LIMIT order_asc 1;

                              GENERATE flatten(first_rec) as(row_id,name,term,gpa);

                             };

names_last_each = FOREACH names_grp
                             {
                               order_desc = ORDER names_each BY row_id DESC;
                               last_rec   = LIMIT order_desc 1;

                               GENERATE flatten(last_rec) as(row_id,name,term,gpa);

                              };

names_unioned = UNION names_first_each,names_last_each;

names_extract = FOREACH names_unioned  GENERATE name,term,gpa;

names_ordered = ORDER names_extract BY name;

dump names_ordered;

输出:-

(John,fl,3.9)
(John,sm,3.8)
(Mary,fl,3.8)
(Mary,sm,4.0)

相关问题