aws emr配置单元:尚未支持udaf“count”的位置

woobm2wo  于 2021-06-27  发布在  Hive
关注(0)|答案(1)|浏览(473)

我有一个相当复杂的查询,我正试图转换为使用Hive。
具体来说,我将它作为aws emr集群中的一个配置单元“步骤”来运行。
我已经试着为帖子清理了一些问题,只留下问题的本质。
完整的错误消息是:

FAILED: SemanticException [Error 10128]: Line XX:XX Not yet supported place for UDAF 'COUNT'

行号指向 COUNT 在select语句的底部:

INSERT INTO db.new_table (
        new_column1,
        new_column2,
        new_column3,
        ... ,
        new_column20
    ) 
    SELECT MD5(COALESCE(TBL1.col1," ")||"_"||COALESCE(new_column5," ")||"_"||...) AS 
        new_col1,
        TBL1.col2,
        TBL1.col3,
        TBL1.col3 AS new_column3,
        TBL1.col4,
        CASE
            WHEN TBL1.col5 = …
            ELSE “some value”
        END AS new_column5,
        TBL1.col6,
        TBL1.col7,
        TBL1.col8,
        CASE
            WHEN TBL1.col9 = …
            ELSE "some value"
        END AS new_column9,
        CASE 
            WHEN TBL1.col10 = …
            ELSE "value"
        END AS new_column10,
        TBL1.col11,
        "value" AS new_column12,
        TBL2.col1,
        TBL2.col2,
        from_unixtime(…) AS new_column13,
        CAST(…) AS new_column14,
        CAST(…) AS new_column15,
        CAST(…) AS new_column16,
        COUNT(DISTINCT TBL1.col17) AS new_column17
    FROM db.table1 TBL1
    LEFT JOIN 
        db.table2 TBL2
            ON TBL1.col311 = TBL2.col311
    WHERE TBL1.col14 BETWEEN "low" AND "high"
        AND TBL1.col44 = "Y"
        AND TBL1.col55 = "N"
    GROUP BY 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19, 20;

如果我漏掉太多,请告诉我。
谢谢你的帮助!
更新
事实证明,我确实漏掉了太多的信息。对不起那些已经试图帮助。。。
我做了上面的更新。
按列删除第20组,例如:

GROUP BY 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19;

产生: Expression not in GROUP BY key '' '' 最新的
按列删除第20组并添加第一组,例如:

GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18, 19;

产生:

Line XX:XX Invalid table alias or column reference 'new_column5':(possible column
 names are: TBL1.col1, TBL1.col2, (looks like all columns of TBL1), 
TBL2.col1, TBL2.col2, TBL2.col311)

第#行引用带有select语句的行。错误输出中只列出了tbl2中的这三列。
错误似乎指向 COALESCE(new_column5) . 请注意,我有一个 CASE 在tbl 1 select中的语句,我用它运行 AS new_column5 .

iq0todco

iq0todco1#

您正在寻址计算列名 new_column5 在计算它的同一子查询级别。这在 hive 里是不可能的。将其替换为计算本身或使用上层子查询。
这是:

MD5(COALESCE(TBL1.col1," ")||"_"||COALESCE(CASE WHEN TBL1.col5 = … ELSE “some value” END," ")||"_"||...) AS new_col1,

而不是这样:

MD5(COALESCE(TBL1.col1," ")||"_"||COALESCE(new_column5," ")||"_"||...) AS 
        new_col1,

相关问题