通过pig过滤空值

chhqkbe1  于 2021-06-21  发布在  Pig
关注(0)|答案(2)|浏览(596)

我有一张我从下面看的表。

A = load 'customer' using PigStorage('|');

在客户跟踪中有一些

7|Ron|ron@abc.com
8|Rina  
9|Don|dmes@xyz.com
9|Don|dmes@xyz.com
10|Maya|maya@cnn.com

11|marry|mary@abc.com

当我使用下列。。。。

B = DISTINCT A;
A_CLEAN = FILTER B by ($0 is not null) AND ($1 is not null) AND ($2 is not null);

它也会移除8 | rina
如何通过pig删除空行?
有没有办法通过not isnull()来尝试a_clean=filter b???
我是新来的Pig,所以不知道我应该把里面是空的。。。
谢谢
a_clean=过滤器b by not isempty(b);

7rfyedvj

7rfyedvj1#

请尝试以下操作:

A = LOAD 'customer' USING PigStorage('|');
B = DISTINCT A;
A_CLEAN = FILTER B BY NOT(($0 IS NULL) AND ($1 IS NULL) AND ($2 IS NULL));
DUMP A_CLEAN;

这将产生输出:
(8,瑞娜)
(7,ron,ron@.com)
(9,don,dmes@xyz.com)
(10,maya,maya@cnn.com)
(11,marry,mary@.com)
在pig中,不能测试元组的空值。

6ss1mwsb

6ss1mwsb2#

Tarun, instead AND condition why can't you put OR condition.
        A_CLEAN = FILTER B by ($0 is not null) OR ($1 is not null) OR ($2 is not null);
 This will remove all the null rows and retain if any columns is not empty. 
 Can you try and let me know if this works for your all conditions?

更新:
我不知道为什么isempty()不为你工作,它为我工作。isempty将只与bag一起工作,因此我将所有字段转换为bag并测试空性。参见下面的工作代码。

input.txt
7|Ron|ron@abc.com
8|Rina
9|Don|dmes@xyz.com
9|Don|dmes@xyz.com
10|Maya|maya@cnn.com

11|marry|mary@abc.com

PigSCript:
A = LOAD 'input.txt' USING PigStorage('|');
B = DISTINCT A;
A_CLEAN = FILTER B BY NOT IsEmpty(TOBAG($0..));
DUMP A_CLEAN;

Output:
(8,Rina  )
(7,Ron,ron@abc.com)
(9,Don,dmes@xyz.com)
(10,Maya,maya@cnn.com)
(11,marry,mary@abc.com)

对于你的另一个问题,这是一个简单的数学计算

In case of AND, 
8|Rina
 will be treated as
 ($0 is not null) AND ($1 is not null) AND ($2 is not null)
 (true) AND (true) AND (false)
 (false) -->so this record will be skipped by Filter command

In case of OR, 
8|Rina
 will be treated as
 ($0 is not null) OR ($1 is not null) OR ($2 is not null)
 (true) OR (true) OR (false)
 (true) -->so this record will be included into the relation by Filter command

In case of empty record, 
<empty record>
  will be treated as
  ($0 is not null) OR ($1 is not null) OR ($2 is not null)
  (false) OR (false) OR (false)
  (false) -->so this record will be skipped by Filter command

相关问题