apachepig:flatten(udf生成)bag后跟join导致classcastexception

f45qwnt8  于 2021-06-21  发布在  Pig
关注(0)|答案(0)|浏览(220)

我试图在pig中将一包值展平到多个记录中,然后将这些记录与具有这些展平值的其他记录连接起来。然而,我的努力不断导致classcastexception(pig databytearray到java integer)。这里有一个mwe来重现这个问题。
输入文件
文件a: a.txt ```
123, fruit1
234, fruit2
345, fruit3
783, fruit4
928, fruit5
317, fruit6
937, fruit7

文件b: `b.txt` ```
global23; [num1#123,num2#234]
global45; [num1#783,num2#928,num3#317]

python自定义项: udf.py ```
@outputSchema("values:bag{t:tuple(value:int)}")
def bag_of_tuples(map_dict):
return map_dict.values()

Pig脚本:

REGISTER 'udf.py' using jython as udf;

a = LOAD 'a.txt' using PigStorage(',') AS (num: int, fruit: chararray);
b = LOAD 'b.txt' using PigStorage(';') AS (global: chararray, mymap: map[]);
c = FOREACH b GENERATE global AS (global: chararray), FLATTEN(udf.bag_of_tuples(mymap)) AS (othernum: int);

d = JOIN a BY num, c BY othernum;
DUMP d;

预期结果
合并的记录,例如:

num, fruit, global, othernum
(123, fruit1, global23, 123)
(234, fruit2, global23, 234)
...

有什么想法吗?这可能是个错误。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题