在pig中完全外部连接后丢弃空值

envsm3lx  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(403)

需要帮助在pig拉丁语中丢弃完全外部联接结果中的空值。以下是两个数据集:
答:

(BOS,2)
(BUR,81)
(LAS,8)

第二:

(BUR,56)
(EWR,2)
(LAS,88)

完全外接后:c:

(BOS,2,,)
(BUR,81,BUR,56)
(,,EWR,2)
(LAS,8,LAS,88)

我需要得到以下格式的输出:

(BOS,2)
(BUR,137)
(EWR,2)
(LAS,96)

尝试了不同的组合,分组,扁平,袋状。。。但没能找到解决办法。非常感谢你的帮助。

airline = load '/demo/data/airline/airline.csv' using PigStorage(',') as (Origin: chararray, Dest: chararray); 
traffic_in = GROUP airline by Origin; 
traffic_in_count= FOREACH traffic_in generate group as Origin , COUNT(airline) as count ; 
traffic_out = GROUP airline by Dest; 
traffic_out_count = FOREACH traffic_out generate group as Dest ,COUNT (airline) as count; 
traffic_top = JOIN traffic_in_count by Origin FULL OUTER , traffic_out_count by Dest ;
mrphzbgm

mrphzbgm1#

编辑而不是使用外部联接使用并集,然后对第二列值求和。

A = LOAD 'test1.txt' using PigStorage(',') as (A1:chararray, A2:int); 
B = LOAD 'test2.txt' using PigStorage(',') as (B1:chararray, B2:int); 
C = UNION A,B;
D = GROUP C BY $0;
E = FOREACH D GENERATE group,SUM(C.$1);
DUMP E;

输出

相关问题