A = LOAD 'data.csv' USING PigStorage(',') AS (f1, f2, f3, f4);
B = FOREACH A GENERATE
( f1 IS NULL ? 0 : 1 ) AS f1_validity,
( f2 IS NULL ? 0 : 1 ) AS f2_validity,
( f3 IS NULL ? 0 : 1 ) AS f3_validity,
( f4 IS NULL ? 0 : 1 ) AS f4_validity;
C = FOREACH B GENERATE
f1_validity + f2_validity + f3_validity + f4_validity AS valid_field_cnt;
我希望我能正确理解你的问题。 我现在也这么做了:)
A = LOAD 'SO/null_data2.csv' USING PigStorage(',') AS (f1, f2, f3, f4);
DESCRIBE A;
--DUMP A;
B = FOREACH A GENERATE
( f1 IS NULL ? 0 : 1 ) AS f1_validity,
( f2 IS NULL ? 0 : 1 ) AS f2_validity,
( f3 IS NULL ? 0 : 1 ) AS f3_validity,
( f4 IS NULL ? 0 : 1 ) AS f4_validity,
( f1 IS NULL ? 0 : f1 ) AS f1,
( f2 IS NULL ? 0 : f2 ) AS f2,
( f3 IS NULL ? 0 : f3 ) AS f3,
( f4 IS NULL ? 0 : f4 ) AS f4;
DESCRIBE B;
C = FOREACH B GENERATE
f1_validity + f2_validity + f3_validity + f4_validity AS not_null_cnt,
f1 + f2 + f3 + f4 AS sum;
DESCRIBE C;
DUMP C;
A = LOAD 'test8.txt' USING PigStorage('\t') AS (a,b,c,d);
B = FOREACH A GENERATE ( a is null ? 0 : 1 ) AS a1,
( b is null ? 0 : 1 ) AS b1,
( c is null ? 0 : 1 ) AS c1,
( d is null ? 0 : 1 ) AS d1,
( a is null ? 0 : a ) AS a,
( b is null ? 0 : b ) AS b,
( c is null ? 0 : c ) AS c,
( d is null ? 0 : d ) AS d;
C = FOREACH B GENERATE a1 + b1 + c1 + d1 as field_count,
((a + b + c + d) == 0 ? null : (a + b + c + d)) as field_sum;
DUMP C;
2条答案
按热度按时间bnl4lu3b1#
下面是一个如何做到这一点的例子
我希望我能正确理解你的问题。
我现在也这么做了:)
r7knjye22#
输出