pig脚本在使用宏两次时失败

mwecs4sa  于 2021-06-25  发布在  Pig
关注(0)|答案(0)|浏览(325)

我有一个pig脚本,它使用宏两次,在同一个关系上,但参数不同;对于每次使用,我在不同的字段中过滤相同的关系。宏的形状如下:

  1. DEFINE doubleGroupJoin (mainField, mainRelation) returns out {
  2. valid = FILTER $mainRelation BY $mainField != '';
  3. r1 = FOREACH (GROUP valid BY $mainField) GENERATE
  4. field1_1, field1_2, ...;
  5. r2 = FOREACH (GROUP valid BY ($mainField, otherfield1, ...) GENERATE
  6. field2_1, field2_2, ...;
  7. $out = FOREACH (JOIN R1 BY field1_1, R2 BY field1_2) GENERATE
  8. final1, final2, ...;
  9. }

在脚本中,我有以下内容:

  1. -- Output1
  2. finalR1 = doubleGroupJoin('field1', initialData);
  3. STORE finalR1 INTO '$output/R1';
  4. -- Output2
  5. finalR2 = doubleGroupJoin('field2', initialData);
  6. STORE finalR2 INTO '$output/R2';

如果我注解掉output1或output2块,则作业可以正常工作,但如果我尝试同时使用这两个块,则会出现以下错误:

  1. java.lang.ClassCastException: org.apache.pig.data.BinSedesTuple cannot be cast to java.lang.String
  2. at org.apache.pig.backend.hadoop.HDataType.getWritableComparableTypes(HDataType.java:106)
  3. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapReduce$Map.collect(PigGenericMapReduce.java:111)
  4. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.runPipeline(PigGenericMapBase.java:284)
  5. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:277)
  6. at org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigGenericMapBase.map(PigGenericMapBase.java:64)
  7. at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:144)
  8. at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
  9. at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
  10. at org.apache.hadoop.mapred.LocalJobRunner$Job.run(LocalJobRunner.java:212)

在这里使用pig 0.12.0。有什么关于为什么会这样的建议吗?

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题