hadoop文件在pig中逐字段比较

wrrgggsh 于 2021-05-30 发布在 Hadoop

关注(0)|答案(1)|浏览(325)

我有两份档案
文件1

id,sal,location,code
1000,1000,jupiter,F
1001,2000,jupiter,F
1002,3000,jupiter,F
1003,4000,jupiter,F
1004,5000,jupiter,F

文件2

id,sal,location,code
1000,2000,jupiter,F
1001,2000,jupiter,Z
1002,3000,jupiter,F
1003,4000,jupiter,F
1004,5000,jupiter,F

当我比较文件1和文件2时，我需要一个如下的输出

1000, sal
1001,code

基本上，它应该告诉我什么领域是改变了从以前的文件随着id。这可以在Pig。

hadoop mapreduce apache-pig

来源：https://stackoverflow.com/questions/29497206/files-comparing-field-by-field-in-pig

1条答案

按热度按时间

9gm1akwq1#

您可以很容易地解决这个问题，但最具挑战性的部分将是您提到的输出格式。它需要一点复杂的逻辑来获得输出格式。
我已经修复了大多数的边缘情况，但你可以检查你的输入，以确保它适用于所有的组合。
文件1：

1000,1000,jupiter,F
1001,2000,jupiter,F
1002,3000,jupiter,F
1003,4000,jupiter,F
1004,5000,jupiter,F

文件2：

1000,2000,jupiter,F
1001,2000,jupiter,Z
1002,3000,jupiter,F
1003,4000,jupiter,F
1004,5000,jupiter,F

Pig手稿：

A = LOAD 'file1' USING PigStorage(',') AS (id,sal,location,code);
    B = LOAD 'file2' USING PigStorage(',') AS (id,sal,location,code);
    C = JOIN A BY id,B BY id;
    D = FOREACH C GENERATE A::id AS id,((A::sal == B::sal)?'':'sal') AS sal,
                                       ((A::location == B::location)?'':'location') AS location,
                                       ((A::code == B::code)?'':'code') AS code;

    --Remove the common fields between two files    
    E = FILTER D BY NOT (sal=='' AND location=='' AND code=='');

    --The below two lines are used to formatting the output 
    F = FOREACH E GENERATE id,REPLACE(BagToString(TOBAG(sal,location,code),','),'(,,$|,$)','') As finalOutput;
    G = FOREACH F GENERATE id,REPLACE(finalOutput,',,',',');
    DUMP G;

输出：

(1000,sal)
(1001,code)

赞(0）回复(0）举报 2021-05-30

我来回答

hadoop文件在pig中逐字段比较

1条答案

相关问题

热门标签

最新问答