pig:udf未返回预期的结果集

huwehgph  于 2021-06-03  发布在  Hadoop
关注(0)|答案(2)|浏览(347)

这是我正在研究的样本数据:

Peter   Wilkerson   27  M
James   Owen    26  M
Matt    Wo  30  M
Kenny   Chen    28  M

我创建了一个简单的 UDF 对于像这样过滤年龄:

public class IsApplicable extends FilterFunc {

    @Override
    public Boolean exec(Tuple tuple) throws IOException {
        if(tuple == null || tuple.size() > 0){
            return false;
        }
        try {
            Object object = tuple.get(0);
            if(object == null){
                return false;
            }
            int age = (Integer)object;
            return age > 28;
        } catch (Exception e) {
            throw new IOException(e);
        }
    }

}

这是我用来使用这个自定义项的脚本:

records = LOAD '~/Documents/data.txt' AS (firstname:chararray,lastname:chararray,age:int,gender:chararray);
filtered_records = FILTER records BY com.udf.IsApplicable(age);
dump filtered_records;

转储不显示任何记录。请让我知道我错过了哪里。

vhmi4jdf

vhmi4jdf1#

又回来了 false 对于所有行:

if (tuple == null || tuple.size() > 0) {
    return false;
}

这是在获取 userName 而不是 age :

Object object = tuple.get(0);
but5z9lq

but5z9lq2#

tuple.size() > 0 条件是 always trueif stmt ,所以它永远不会去 try block(ie filtering logic) ,这就是您得到空结果的原因。你能这样改变if条件吗?

System.out.println("TupleSize="+tuple.size());
     if(tuple == null || tuple.size() < 0){
            return false;
        }

控制台中的调试输出示例:

2015-02-13 07:40:46,994 [Thread-2] INFO  org.apache.pig.backend.hadoop.executionengine.mapReduceLayer.PigMapOnly$Map - Aliases being processed per job phase (AliasName[line,offset]): M: records[3,10],records[-1,-1],filtered_records[4,19] C:  R: 
TupleSize=1
TupleSize=1
TupleSize=1

相关问题