pig中的合并元组

mf98qq94  于 2021-06-03  发布在  Hadoop
关注(0)|答案(1)|浏览(288)

我有两组元组,我想通过第一个元素将它们内部连接起来,并将其他部分合并到一个元组中,我想知道如何在hadoop上用pig实现这一点?
输入两个元组集,

1,(1,2)
2,(2,3)

1,(b,c,b,c)
2,(c,d,c,d)

预期产量,

1,(1,2,b,c,b,c)
2,(2,3,c,d,c,d)

提前谢谢你,林

rbl8hiat

rbl8hiat1#

一个值得深思的想法。。。
输入:
数据A:

1   (1,2)
2   (2,3)

数据库:

1   (b,c,b,c)
2   (c,d,c,d)

Pig脚本:

A = LOAD 'dataA'  USING  PigStorage('\t') AS  (aid:long, atuple : tuple(af1:long, af2:long));
B = LOAD 'dataB'  USING  PigStorage('\t') AS  (bid:long, btuple : tuple(bf1:chararray, bf2:chararray, bf3:chararray, bf4:chararray));
C = JOIN A BY aid, B BY bid;
D = FOREACH C GENERATE aid AS id, FLATTEN(atuple) AS (af1:long, af2:long) , FLATTEN(btuple) AS (bf1:chararray, bf2:chararray, bf3:chararray, bf4:chararray);
E = FOREACH D GENERATE id, (af1..bf4);
DUMP E;

输出:转储e:

(1,(1,2,b,c,b,c))
(2,(2,3,c,d,c,d))

相关问题