无法调用接受元组输入的javaudf

2w2cym1i  于 2021-06-24  发布在  Pig
关注(0)|答案(1)|浏览(328)

我无法理解调用接受元组作为输入的javaudf的方法。

gsmCell = LOAD '$gsmCell' using PigStorage('\t') as
          (branchId,
           cellId: int,
           lac: int,
           lon: double,
           lat: double
          );

gsmCellFiltered = FILTER gsmCell BY     cellId     is not null and
                                        lac        is not null and
                                        lon        is not null and
                                        lat        is not null;

gsmCellFixed = FOREACH gsmCellFiltered GENERATE FLATTEN (pig.parser.GSMCellParser(* ) )  as
                                                (cellId: int,
                                                 lac: int,
                                                 lon: double,
                                                 lat: double,
                                                );

当我使用() Package gsmcellparser的输入时,我进入了udf:tuple(tuple)内部。pig将所有字段 Package 成元组,并将其放入另一个元组中。
当我试图传递字段列表时,请使用*或$0。。我有个例外:

sed by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1045: 
<line 28, column 57> Could not infer the matching function for pig.parser.GSMCellParser as multiple or none of them fit. Please use an explicit cast.
    at org.apache.pig.newplan.logical.visitor.TypeCheckingExpVisitor.visit(TypeCheckingExpVisitor.java:761)
    at org.apache.pig.newplan.logical.expression.UserFuncExpression.accept(UserFuncExpression.java:88)
    at org.apache.pig.newplan.ReverseDependencyOrderWalker.walk(ReverseDependencyOrderWalker.java:70)
    at org.apache.pig.newplan.PlanVisitor.visit(PlanVisitor.java:52)
    at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visitExpressionPlan(TypeCheckingRelVisitor.java:191)
    at org.apache.pig.newplan.logical.visitor.TypeCheckingRelVisitor.visit(TypeCheckingRelVisitor.java:157)
    at org.apache.pig.newplan.logical.relational.LOGenerate.accept(LOGenerate.java:246)

我做错了什么?我的目标是用tuple来喂养我的udf。元组应该包含字段列表(i、 元组的大小应该是4:cellid,lac,lon。拉丁美洲)
upd:我试过所有的组:

--filter non valid records
gsmCellFiltered = FILTER gsmCell BY     cellId     is not null and
                                        lac        is not null and
                                        lon        is not null and
                                        lat        is not null and
                                        azimuth    is not null and
                                        angWidth   is not null;

gsmCellFilteredGrouped = GROUP gsmCellFiltered ALL;

--fix records
gsmCellFixed = FOREACH gsmCellFilteredGrouped GENERATE FLATTEN                  (pig.parser.GSMCellParser($1))  as
                                                        (cellId: int,
                                                         lac: int,
                                                         lon: double,
                                                         lat: double,
                                                         azimuth: double,
                                                         ppw,
                                                         midDist: double,
                                                         maxDist,
                                                         cellType: chararray,
                                                         angWidth: double,
                                                         gen: chararray,
                                                         startAngle: double
                                                        );

Caused by: org.apache.pig.impl.logicalLayer.validators.TypeCheckerException: ERROR 1045: 
<line 27, column 64> Could not infer the matching function for pig.parser.GSMCellParser as multiple or none of them fit. Please use an explicit cast.

这个自定义项的输入模式是:tuple i don't get the idea。tuple是一组有序的文件。load函数向我返回一个元组。我想把整个元组传给我的自定义项。

yjghlzjz

yjghlzjz1#

T EvalFunc<T>.eval(Tuple) 方法,您可以看到所有evalfunc udf都被传递了一个元组—这个元组包含传递给udf的所有参数。
就你而言,打电话 GSMCellParser(*) 意味着元组的第一个参数将是当前正在处理的元组(因此是元组中的元组)。
从概念上讲,如果希望元组只包含应该作为 GSMCellParser(cellid, lac, lat, lon) ,则传递给eval func的元组将具有 (int, int, double, double) . 这也使元组编码更容易,因为您不必从传递的“元组中的元组”中找出字段,而是知道字段0是cellid,字段1id是lac,等等。

相关问题