关于pig的查询-如何在foreach中设置if-like条件

yzckvree  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(370)

我有一个写Pig脚本的问题

RESULT_SOMETYPE = FOREACH SOMETYPE_DATA_GROUPED  GENERATE flatten(group) , SUM(SOMETYPEDATA.DURATION) as duration, COUNT(SOMETYPEDATA.DURATION) as cnt;

这里我想用一些数字替换sum(sometypedata.duration),比如

if(0>Sum > 1000) then put 1
if(1001> Sum > 2000 )  then put 2
if(2001> Sum > 3000 )  then put 3

如何在Pig身上实现这一点
请建议

lnlaulya

lnlaulya1#

我们可以使用bincond运算符(?:)或case语句(来自wards上的pig版本:0.12)来实现目标。

RESULT_SOMETYPE = FOREACH SOMETYPE_DATA_GROUPED  GENERATE flatten(group) AS grp_name , SUM(SOMETYPEDATA.DURATION) as duration_sum, COUNT(SOMETYPEDATA.DURATION) as cnt;

result_required = FOREACH RESULT_SOMETYPE GENEATE grp_name, 
                        (duration_sum > 0 AND duration_sum < 1000 ? 1 : 
                                        (duration_sum > 1001 AND duration_sum < 2000 ? 2 : 
                                                (duration_sum > 2001 AND duration_sum < 3000 ? 3 : 9999)     
                                        )
                         ) AS duration, cnt;

参考:http://pig.apache.org/docs/r0.12.0/basic.html#arithmetic

jjjwad0x

jjjwad0x2#

SPLIT 会这样做,但不会在里面 FOREACH 循环。pig还有一个三元运算符之类的东西,但这对将结果存储在变量中没有帮助。下面是如何使用split来实现接近您需求的功能。

A = LOAD '/home/vignesh/a.dat' using PigStorage(',') as (a:int,b:int,c:int);
SPLIT A INTO B IF (a > 0 AND a < 1000),  C IF (a > 1001 AND a<2000), D IF (a > 2001 AND a < 3000);

相关问题