我试图在pig中将一包值展平到多个记录中,然后将这些记录与具有这些展平值的其他记录连接起来。然而,我的努力不断导致classcastexception(pig databytearray到java integer)。这里有一个mwe来重现这个问题。
输入文件
文件a: a.txt
```
123, fruit1
234, fruit2
345, fruit3
783, fruit4
928, fruit5
317, fruit6
937, fruit7
文件b: `b.txt` ```
global23; [num1#123,num2#234]
global45; [num1#783,num2#928,num3#317]
python自定义项: udf.py
```
@outputSchema("values:bag{t:tuple(value:int)}")
def bag_of_tuples(map_dict):
return map_dict.values()
Pig脚本:
REGISTER 'udf.py' using jython as udf;
a = LOAD 'a.txt' using PigStorage(',') AS (num: int, fruit: chararray);
b = LOAD 'b.txt' using PigStorage(';') AS (global: chararray, mymap: map[]);
c = FOREACH b GENERATE global AS (global: chararray), FLATTEN(udf.bag_of_tuples(mymap)) AS (othernum: int);
d = JOIN a BY num, c BY othernum;
DUMP d;
预期结果
合并的记录,例如:
num, fruit, global, othernum
(123, fruit1, global23, 123)
(234, fruit2, global23, 234)
...
有什么想法吗?这可能是个错误。
暂无答案!
目前还没有任何答案,快来回答吧!