如何在pig完全外部联接中组合列

qf9go6mv  于 2021-06-25  发布在  Pig
关注(0)|答案(4)|浏览(352)

我有两张table:

1,'hello'
2,'world'
4,'this'

1,'john'
3,'king'

我想要一张table

1,'hello','john'
2,'world',''
3,''     ,king
4,'this' ,''

我正在使用pig命令:

JOIN A BY code FULL OUTER,
     B BY code;

但这给了我一个结果:

1,'hello',1,'john'
2,'world',,''
,''     ,3,king
4,'this' ,,''

我需要代码列来组合,如何才能做到这一点?谢谢

x6yk4ghg

x6yk4ghg1#

您可以使用union,然后执行groupby
工会a、b将为您提供:

1,'hello'
2,'world'
4,'this'
1,'john'
3,'king'

现在根据id进行groupby。这将为您提供:

1, {'hello', 'john'}
2, {'world'}
3, {'king'}
4, {'this'}

现在你只需要一个自定义项来解析包。在自定义项中,迭代每个键以生成格式的输出。
我也遇到了同样的问题。我就是这样解决的。

0kjbasz6

0kjbasz62#

是的,join总是产生这样的输出,这是pig的预期行为。一个选项可以是try group操作符而不是join操作符。
a、 文本

1,'hello'
2,'world'
4,'this'

b、 文本

1,'john'
3,'king'

Pig手稿:

A = LOAD 'a.txt' USING PigStorage(',') AS (code:int,name:chararray);
B = LOAD 'b.txt' USING PigStorage(',') AS (code:int,name:chararray);
C = GROUP A BY code,B BY code;
D = FOREACH C GENERATE group,(IsEmpty(A.name) ? TOTUPLE('') : BagToTuple(A.name)) AS aname,(IsEmpty(B.name) ? TOTUPLE('') : BagToTuple(B.name)) AS bname;
E = FOREACH D GENERATE group,FLATTEN(aname),FLATTEN(bname);
DUMP E;

输出:

(1,'hello','john')
(2,'world',)
(3,,'king')
(4,'this',)

bagtotuple()在本机pig中不可用,您必须下载pig-0.11.0.jar并在类路径中设置它。
从以下链接下载jar:
http://www.java2s.com/code/jar/p/downloadpig0110jar.htm

3okqufwl

3okqufwl3#

A = load 'a' using PigStorage(',') as (code:int,name:chararray);
B = load 'b' using PigStorage(',') as (code:int,name:chararray);
C = join A by code full outer ,B by code;
D = foreach C generate 
    (A::code IS NULL ? B::code : A::code) AS code,
    A::name as aname, B::name as bname;
dump D;

结果是

(1,'hello','john')
(2,'world',)
(3,,'king')
(4,'this,)
qybjjes1

qybjjes14#

您可以在联接之后使用三元运算符来重新分配新的 code ,基于它是在a关系中填充还是在b关系中填充。在本例中,如果a.code null 然后使用b代码,否则使用a代码。

C = JOIN A BY code FULL OUTER, B BY code;

D = FOREACH C GENERATE
  (A.code IS NULL ? B.code : A.code) AS code,
  A.field1,
  A.field2,
  B.field3,
  B.field4;

相关问题