需要帮助在pig拉丁语中丢弃完全外部联接结果中的空值。以下是两个数据集:
答:
(BOS,2)
(BUR,81)
(LAS,8)
第二:
(BUR,56)
(EWR,2)
(LAS,88)
完全外接后:c:
(BOS,2,,)
(BUR,81,BUR,56)
(,,EWR,2)
(LAS,8,LAS,88)
我需要得到以下格式的输出:
(BOS,2)
(BUR,137)
(EWR,2)
(LAS,96)
尝试了不同的组合,分组,扁平,袋状。。。但没能找到解决办法。非常感谢你的帮助。
airline = load '/demo/data/airline/airline.csv' using PigStorage(',') as (Origin: chararray, Dest: chararray);
traffic_in = GROUP airline by Origin;
traffic_in_count= FOREACH traffic_in generate group as Origin , COUNT(airline) as count ;
traffic_out = GROUP airline by Dest;
traffic_out_count = FOREACH traffic_out generate group as Dest ,COUNT (airline) as count;
traffic_top = JOIN traffic_in_count by Origin FULL OUTER , traffic_out_count by Dest ;
1条答案
按热度按时间mrphzbgm1#
编辑而不是使用外部联接使用并集,然后对第二列值求和。
输出