使用pig对多个字段进行计数和求和

gpnt7bae  于 2021-06-21  发布在  Pig
关注(0)|答案(2)|浏览(316)

我有四个字段是int类型的数据,在数据集中也是空的,所以我需要计算有数据的字段的数量,例如假设第一列和第三列有空值,第二列和第四列有整数值,那么输出是2。第二件事,我需要这些领域的总和太,像在上面的例子输出是2
输入

null null null null
1    3    5    null 
null null 8    5

输出:

0    null 
3    9
2    13
bnl4lu3b

bnl4lu3b1#

下面是一个如何做到这一点的例子

A = LOAD 'data.csv' USING PigStorage(',') AS (f1, f2, f3, f4);

B = FOREACH A GENERATE
    ( f1 IS NULL ? 0 : 1 ) AS f1_validity,
    ( f2 IS NULL ? 0 : 1 ) AS f2_validity,
    ( f3 IS NULL ? 0 : 1 ) AS f3_validity,
    ( f4 IS NULL ? 0 : 1 ) AS f4_validity;

C = FOREACH B GENERATE 
    f1_validity + f2_validity + f3_validity + f4_validity AS valid_field_cnt;

我希望我能正确理解你的问题。
我现在也这么做了:)

A = LOAD 'SO/null_data2.csv' USING PigStorage(',') AS (f1, f2, f3, f4);
DESCRIBE A;

--DUMP A;

B = FOREACH A GENERATE
    ( f1 IS NULL ? 0 : 1 ) AS f1_validity,
    ( f2 IS NULL ? 0 : 1 ) AS f2_validity,
        ( f3 IS NULL ? 0 : 1 ) AS f3_validity,
        ( f4 IS NULL ? 0 : 1 ) AS f4_validity,
    ( f1 IS NULL ? 0 : f1 ) AS f1,
    ( f2 IS NULL ? 0 : f2 ) AS f2,
        ( f3 IS NULL ? 0 : f3 ) AS f3,
        ( f4 IS NULL ? 0 : f4 ) AS f4;
DESCRIBE B;

C = FOREACH B GENERATE 
    f1_validity + f2_validity + f3_validity + f4_validity AS not_null_cnt,
    f1 + f2 + f3 + f4 AS sum;
DESCRIBE C;
DUMP C;
r7knjye2

r7knjye22#

A = LOAD 'test8.txt' USING PigStorage('\t') AS (a,b,c,d);
B = FOREACH A GENERATE ( a is null ? 0 : 1 ) AS a1,
                       ( b is null ? 0 : 1 ) AS b1,
                       ( c is null ? 0 : 1 ) AS c1,
                       ( d is null ? 0 : 1 ) AS d1,
                       ( a is null ? 0 : a ) AS a,
                       ( b is null ? 0 : b ) AS b,
                       ( c is null ? 0 : c ) AS c,
                       ( d is null ? 0 : d ) AS d;
C = FOREACH B GENERATE a1 + b1 + c1 + d1 as field_count,
                       ((a + b + c + d) == 0 ? null : (a + b + c + d)) as field_sum;    
DUMP C;

输出

相关问题