只允许作为顶级连接的子查询表达式

lztngnrs  于 2021-05-31  发布在  Hadoop
关注(0)|答案(1)|浏览(532)

我想更改现有查询,但出现以下错误:
不支持的子查询表达式“deleted”:只允许顶级共轭的子查询表达式
现有查询是:

SELECT DISTINCT
    *
FROM
    geoposition_import AS geo
-- do not take into account data for deleted users
WHERE 
    EXISTS (
        SELECT 1 
        FROM geoposition_import_users AS u 
        WHERE u.id = geo.userId 
            AND NOT u.deleted 
    );

在我们改变之后 userIdgeoposition_import 可以为空,因为地理位置现在也可以由计算机创建。所以我把查询改为

SELECT DISTINCT
    *
FROM
    geoposition_import AS geo
-- do not take into account data for deleted users
WHERE 
    geo.userId IS NULL -- data from non users (e.g. machines) is still fine
    OR
    EXISTS (
        SELECT 1 
        FROM geoposition_import_users AS u 
        WHERE u.id = geo.userId 
            AND NOT u.deleted 
    );

得到了上面提到的错误。
我在谷歌上搜索发现了限制条件:https://docs.cloudera.com/hdpdocuments/hdp3/hdp-3.1.5/using-hiveql/content/hive_hive_subquery_limitations.html
所以我的猜测是:手术室才是问题所在。
现在我的问题是:
为什么错误消息指出“delete”是问题所在?
如何重写查询以使其工作?
我想到的唯一解决办法是:将条件分解成不同的视图,然后执行 UNION ALL .
比如:

CREATE VIEW IF NOT EXISTS geoposition_import_from_non_users AS
SELECT DISTINCT
    *
FROM
    geoposition_import AS geo
WHERE 
    geo.userId IS NULL;

CREATE VIEW IF NOT EXISTS geoposition_import_from_users AS
SELECT DISTINCT
    *
FROM
    geoposition_import AS geo
-- do not take into account data for deleted users
WHERE 
    EXISTS (
        SELECT 1 
        FROM geoposition_import_users AS u 
        WHERE u.id = geo.userId 
            AND NOT u.deleted 
    );

-- staged data with possible duplicates removed
CREATE VIEW IF NOT EXISTS geoposition_import_distinct AS
SELECT * FROM geoposition_import_from_non_users
UNION ALL
SELECT * FROM geoposition_import_from_users;

有什么意见吗?

z4bn682m

z4bn682m1#

尝试左联接而不是现有联接:

SELECT DISTINCT
    geo.*
FROM geoposition_import geo
     LEFT JOIN geoposition_import_users u ON u.id=geo.userId AND NOT u.deleted     
WHERE 
    geo.userId IS NULL -- data from non users (e.g. machines) is still fine
    OR u.id IS NOT NULL;

相关问题