sql—为什么与聚合相比存在的速度如此之慢?

8gsdolmq  于 2021-07-24  发布在  Java
关注(0)|答案(1)|浏览(405)

假设我们有一张候选人表现表

CREATE TABLE IF NOT EXISTS candidates AS
WITH RECURSIVE candidates(team, score) AS (
    SELECT RANDOM() % 1000, RANDOM() % 1000000
    UNION
    SELECT RANDOM() % 1000, RANDOM() % 1000000
    FROM candidates
    LIMIT 1000000
)
SELECT team, score
FROM candidates;

我们的目标是输出1000个团队的列表以及该团队中候选人的总分。不过,如果一支球队的总比分不在上半场,那么它将被一个零分所取代。我想出了两种方法:
EXISTS ,花了 Run Time: real 30.653 user 30.635649 sys 0.008798 ```
WITH top_teams_verbose(top_team, total_score) AS (
SELECT team, SUM(score)
FROM candidates
GROUP BY team
ORDER BY 2 DESC
LIMIT 500
)
SELECT team, SUM(score) * EXISTS(SELECT 1 FROM top_teams_verbose WHERE team = top_team)
FROM candidates
GROUP BY team;

查询计划

QUERY PLAN
|--SCAN TABLE candidates
|--USE TEMP B-TREE FOR GROUP BY
--CORRELATED SCALAR SUBQUERY 2 |--CO-ROUTINE 1 | |--SCAN TABLE candidates | |--USE TEMP B-TREE FOR GROUP BY |--USE TEMP B-TREE FOR ORDER BY
`--SCAN SUBQUERY 1

与 `IN` ,花了 `Run Time: real 0.045 user 0.041872 sys 0.002999` ```
WITH top_teams_verbose(top_team, total_score) AS (
    SELECT team, SUM(score)
    FROM candidates
    GROUP BY team
    ORDER BY 2 DESC
    LIMIT 500
),
top_teams AS (
    SELECT top_team
    FROM top_teams_verbose
)
SELECT team, SUM(score) * (team IN top_teams)
FROM candidates
GROUP BY team;

查询计划

QUERY PLAN
|--SCAN TABLE candidates
|--USE TEMP B-TREE FOR GROUP BY
`--LIST SUBQUERY 3
   |--CO-ROUTINE 1
   |  |--SCAN TABLE candidates
   |  |--USE TEMP B-TREE FOR GROUP BY
   |  `--USE TEMP B-TREE FOR ORDER BY
   `--SCAN SUBQUERY 1

为什么会这样?也许 吧 EXISTS 为每行执行,而 IN 是否用作聚合函数?我看了一下查询计划,唯一的区别( CORRELATED SCALAR SUBQUERYLIST SUBQUERY )太抽象了,无法提供信息。
我使用的是sqlite3版本 3.31.1 2020-01-27 19:55:54 3bfa9cc97da10598521b342961df8f5f68c7388fa117345eeb516eaa837bb4d6 在rhel 7上。

x6h2sr28

x6h2sr281#

我没有时间对查询计划进行深入分析,但事实证明 EXISTS 使公共表表达式为每行重新运行。根据维基百科,这被称为相关查询:
在sql数据库查询中,相关子查询(也称为同步子查询)是使用外部查询中的值的子查询(嵌套在另一个查询中的查询)。因为对于外部查询处理的每一行,子查询可能只计算一次,所以速度可能会很慢。
具体来说,考虑下面的例子,从总共10个团队中随机选择5个“幸运团队”

DROP TABLE IF EXISTS candidates;

CREATE TABLE candidates AS
WITH RECURSIVE candidates(team, score) AS (
    SELECT ABS(RANDOM()) % 10, 1
    UNION
    SELECT ABS(RANDOM()) % 10, score + 1
    FROM candidates
    LIMIT 100
)
SELECT team, score
FROM candidates;

WITH lucky_teams(lucky_team, total_score) AS (
    SELECT team, SUM(score)
    FROM candidates
    GROUP BY team
    ORDER BY RANDOM()
    LIMIT 5
)
SELECT team, SUM(score) * EXISTS(
    SELECT 1
    FROM lucky_teams
    WHERE team = lucky_team
)
FROM candidates
GROUP BY team
ORDER BY team;

有人可能希望总是有5支球队得分为正,其余球队得分为零,但有时你会得到4到7分的正分数。这是因为每个团队都有50%的独立机会出现在比赛中 lucky_teams 表,以防止他们的分数被归零。

$ sqlite3 < quirky.sql
0|0
1|752
2|285
3|620
4|223
5|0
6|0
7|423
8|1035
9|370

相关问题