sparksql不能与hiveudaf一起工作

7tofc5zh  于 2021-05-29  发布在  Hadoop
关注(0)|答案(0)|浏览(333)

我使用的是aws emr+spark 1.6.1+hive 1.0.0
我有这个udaf,并将它包含在spark的类路径中https://github.com/scribd/hive-udaf-maxrow/blob/master/src/com/scribd/hive/udaf/genericudafmaxrow.java
并通过sqlcontext.sql在spark中注册它(“将临时函数maxrow创建为'some.cool.package.hive.udf.genericudafmaxrow'”
但是,当我在下面的查询中调用spark时

CREATE VIEW VIEW_1 AS
      SELECT
        a.A,
        a.B,
        maxrow ( a.C,
                 a.D,
                 a.E,
                 a.F,
                 a.G,
                 a.H,
                 a.I
            ) as m
        FROM
            table_1 a
        JOIN
            table_2 b
        ON
                b.Z = a.D
            AND b.Y  = a.C
        JOIN dummy_table
        GROUP BY
            a.A,
            a.B

它给了我这个错误

16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.A was overwritten in RowResolver map: _col0: string by _col0: string
16/05/18 19:49:14 WARN RowResolver: Duplicate column info for a.B was overwritten in RowResolver map: _col1: bigint by _col1: bigint
16/05/18 19:49:14 ERROR Driver: FAILED: SemanticException [Error 10002]: Line 16:32 Invalid column reference 'C'
org.apache.hadoop.hive.ql.parse.SemanticException: Line 16:32 Invalid column reference 'C'
                at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genAllExprNodeDesc(SemanticAnalyzer.java:10643)
                at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genExprNodeDesc(SemanticAnalyzer.java:10591)
                at org.apache.hadoop.hive.ql.parse.SemanticAnalyzer.genSelectPlan(SemanticAnalyzer.java:3656)

但是如果我去掉GROUPBY子句和聚合函数,它就起作用了。所以我怀疑sparksql是否认为它不是一个聚合函数。
感谢您的帮助。谢谢。

暂无答案!

目前还没有任何答案,快来回答吧!

相关问题