我试图优化我的查询,它运行在PostgreSQL版本13或14。
假设我们有一个带有相应索引的表:
CREATE TABLE journal
(
account_id TEXT NOT NULL,
event_at TEXT NOT NULL,
id INTEGER PRIMARY KEY
);
CREATE INDEX journal_account_id_event_at_id_idx
ON journal (account_id, event_at, id);
INSERT INTO journal VALUES ('A', '2023-01-01', 1);
INSERT INTO journal VALUES ('A', '2023-01-10', 50);
INSERT INTO journal VALUES ('A', '2023-01-30', 15);
INSERT INTO journal VALUES ('A', '2023-03-02', 28);
INSERT INTO journal VALUES ('A', '2023-03-05', 16);
INSERT INTO journal VALUES ('B', '2023-01-01', 101);
INSERT INTO journal VALUES ('B', '2023-01-01', 102);
INSERT INTO journal VALUES ('B', '2023-01-01', 103);
INSERT INTO journal VALUES ('C', '2022-12-01', 2);
INSERT INTO journal VALUES ('C', '2023-01-02', 10);
INSERT INTO journal VALUES ('C', '2023-01-30', 6);
INSERT INTO journal VALUES ('C', '2023-01-30', 20);
INSERT INTO journal VALUES ('C', '2023-02-02', 29);
INSERT INTO journal VALUES ('C', '2023-03-03', 31);
字符串
我需要通过分页从上述表格中进行选择。如果我们知道我们需要选择日记账条目的帐户,那么我们可以这样做:
问题1:
SELECT *
FROM journal
WHERE
account_id = 'C' -- account condition
AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, id
LIMIT 2; -- page size for pagination
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.14..8.15 rows=1 width=68) (actual time=0.029..0.036 rows=2 loops=1)
Output: account_id, event_at, id
-> Index Only Scan using journal_account_id_event_at_id_idx on public.journal (cost=0.14..8.15 rows=1 width=68) (actual time=0.026..0.029 rows=2 loops=1)
Output: account_id, event_at, id
Index Cond: ((journal.account_id = 'C'::text) AND (ROW(journal.event_at, journal.id) > ROW('2022-12-01'::text, 2)))
Heap Fetches: 2
Planning Time: 0.204 ms
Execution Time: 0.085 ms
(8 rows)
型
这工作得很好,postgres使用索引,正如你从查询计划中看到的,根据limit,只读取了2个日记条目(请参阅“仅索引扫描”节点的行数=2)。然而,如果我们需要为多个账户选择日记条目,并且我们事先不知道确切的账户,postgres似乎选择了所有条目并对其进行排序,以便适当地限制输出:
问题2:
WITH accounts AS
(
-- let's pretend here is some select from 'accounts' table
SELECT *
FROM unnest(ARRAY['A', 'C']::TEXT[]) AS account(id)
)
SELECT j.*
FROM journal j JOIN accounts a ON a.id = j.account_id -- account condition
WHERE (event_at, j.id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, j.id
LIMIT 2; -- page size for pagination
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=12.36..12.37 rows=1 width=68) (actual time=0.120..0.134 rows=2 loops=1)
Output: j.account_id, j.event_at, j.id
-> Sort (cost=12.36..12.37 rows=1 width=68) (actual time=0.117..0.125 rows=2 loops=1)
Output: j.account_id, j.event_at, j.id
Sort Key: j.event_at, j.id
Sort Method: top-N heapsort Memory: 25kB
-> Nested Loop (cost=0.14..12.35 rows=1 width=68) (actual time=0.049..0.091 rows=10 loops=1)
Output: j.account_id, j.event_at, j.id
-> Function Scan on pg_catalog.unnest account (cost=0.00..0.02 rows=2 width=32) (actual time=0.008..0.011 rows=2 loops=1)
Output: account.id
Function Call: unnest('{A,C}'::text[])
-> Index Only Scan using journal_account_id_event_at_id_idx on public.journal j (cost=0.14..6.15 rows=1 width=68) (actual time=0.020..0.025 rows=5 loops=2)
Output: j.account_id, j.event_at, j.id
Index Cond: ((j.account_id = account.id) AND (ROW(j.event_at, j.id) > ROW('2022-12-01'::text, 2)))
Heap Fetches: 10
Planning Time: 0.307 ms
Execution Time: 0.188 ms
(17 rows)
型
我们现在从查询计划中看到,'Nested loop'节点的行数=10,这意味着postgres读取所有满足分页条件的条目,只是为了稍后在'Limit'节点中省略它们。(account_id,event_at,id),postgres可以从“accounts”CTE中读取每个帐户的limit日志条目,然后合并它们而不进行排序,因为条目已经按(event_at,id)排序,本质上是这样做的:
问题3:
SELECT *
FROM (
SELECT *
FROM journal
WHERE (account_id = 'A')
AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, id
LIMIT 2
) j1
UNION ALL
SELECT *
FROM (
SELECT *
FROM journal
WHERE (account_id = 'C')
AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, id
LIMIT 2
) j2
ORDER BY event_at, id
LIMIT 2;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=0.28..16.36 rows=2 width=68) (actual time=0.053..0.070 rows=2 loops=1)
Output: journal.account_id, journal.event_at, journal.id
-> Merge Append (cost=0.28..16.36 rows=2 width=68) (actual time=0.050..0.063 rows=2 loops=1)
Sort Key: journal.event_at, journal.id
-> Limit (cost=0.14..8.15 rows=1 width=68) (actual time=0.033..0.038 rows=2 loops=1)
Output: journal.account_id, journal.event_at, journal.id
-> Index Only Scan using journal_account_id_event_at_id_idx on public.journal (cost=0.14..8.15 rows=1 width=68) (actual time=0.031..0.033 rows=2 loops=1)
Output: journal.account_id, journal.event_at, journal.id
Index Cond: ((journal.account_id = 'A'::text) AND (ROW(journal.event_at, journal.id) > ROW('2022-12-01'::text, 2)))
Heap Fetches: 2
-> Limit (cost=0.14..8.15 rows=1 width=68) (actual time=0.014..0.016 rows=1 loops=1)
Output: journal_1.account_id, journal_1.event_at, journal_1.id
-> Index Only Scan using journal_account_id_event_at_id_idx on public.journal journal_1 (cost=0.14..8.15 rows=1 width=68) (actual time=0.012..0.013 rows=1 loops=1)
Output: journal_1.account_id, journal_1.event_at, journal_1.id
Index Cond: ((journal_1.account_id = 'C'::text) AND (ROW(journal_1.event_at, journal_1.id) > ROW('2022-12-01'::text, 2)))
Heap Fetches: 1
Planning Time: 0.362 ms
Execution Time: 0.132 ms
(18 rows)
型
请帮助我理解如何编写查询#2(针对多个帐户),以便它具有查询#3(利用已经排序的数据,读取比查询#2更少的数据)的查询计划。
编辑#1
感谢mentioning "skip scan"的Laurenz Albe。我能够找到一个可能的解决方案-使用递归CTE来模拟跳过扫描。我尝试将此方法用于我的数据,并得出了以下结果:
问题4:
EXPLAIN (ANALYZE, VERBOSE)
WITH RECURSIVE accounts AS (
-- lets pretent here is some select from 'accounts' table
SELECT *
FROM unnest(ARRAY['A', 'C']::TEXT[]) AS account(id)
ORDER BY id
), journal_entries AS (
(
SELECT *
FROM journal
WHERE
account_id = (SELECT MIN(id) FROM accounts)
AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, id
LIMIT 2 -- page size for pagination
)
UNION ALL
SELECT chunk.*
FROM (
SELECT account_id FROM journal_entries LIMIT 1
) prev CROSS JOIN LATERAL (
SELECT *
FROM journal j
WHERE
j.account_id = (
SELECT MIN(id)
FROM accounts
WHERE id > prev.account_id
LIMIT 1
)
AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, id
LIMIT 2 -- page size for pagination
) chunk
)
SELECT *
FROM journal_entries
ORDER BY event_at, id
LIMIT 2; -- page size for pagination;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=91.42..91.43 rows=2 width=68) (actual time=0.198..0.230 rows=2 loops=1)
Output: journal_entries.account_id, journal_entries.event_at, journal_entries.id
CTE accounts
-> Sort (cost=0.03..0.04 rows=2 width=32) (actual time=0.028..0.032 rows=2 loops=1)
Output: account.id
Sort Key: account.id
Sort Method: quicksort Memory: 25kB
-> Function Scan on pg_catalog.unnest account (cost=0.00..0.02 rows=2 width=32) (actual time=0.009..0.011 rows=2 loops=1)
Output: account.id
Function Call: unnest('{A,C}'::text[])
CTE journal_entries
-> Recursive Union (cost=0.19..91.05 rows=11 width=68) (actual time=0.082..0.179 rows=4 loops=1)
-> Limit (cost=0.19..8.21 rows=1 width=68) (actual time=0.080..0.090 rows=2 loops=1)
Output: journal.account_id, journal.event_at, journal.id
InitPlan 2 (returns $2)
-> Aggregate (cost=0.04..0.06 rows=1 width=32) (actual time=0.044..0.047 rows=1 loops=1)
Output: min(accounts.id)
-> CTE Scan on accounts (cost=0.00..0.04 rows=2 width=32) (actual time=0.030..0.036 rows=2 loops=1)
Output: accounts.id
-> Index Only Scan using journal_account_id_event_at_id_idx on public.journal (cost=0.14..8.15 rows=1 width=68) (actual time=0.078..0.080 rows=2 loops=1)
Output: journal.account_id, journal.event_at, journal.id
Index Cond: ((journal.account_id = $2) AND (ROW(journal.event_at, journal.id) > ROW('2022-12-01'::text, 2)))
Heap Fetches: 2
-> Nested Loop (cost=0.19..8.26 rows=1 width=68) (actual time=0.027..0.037 rows=1 loops=2)
Output: j.account_id, j.event_at, j.id
-> Limit (cost=0.00..0.02 rows=1 width=32) (actual time=0.004..0.006 rows=1 loops=2)
Output: journal_entries_1.account_id
-> WorkTable Scan on journal_entries journal_entries_1 (cost=0.00..0.20 rows=10 width=32) (actual time=0.002..0.002 rows=1 loops=2)
Output: journal_entries_1.account_id
-> Limit (cost=0.19..8.21 rows=1 width=68) (actual time=0.019..0.025 rows=1 loops=2)
Output: j.account_id, j.event_at, j.id
InitPlan 3 (returns $4)
-> Limit (cost=0.05..0.06 rows=1 width=32) (actual time=0.008..0.010 rows=1 loops=2)
Output: (min(accounts_1.id))
-> Aggregate (cost=0.05..0.06 rows=1 width=32) (actual time=0.005..0.007 rows=1 loops=2)
Output: min(accounts_1.id)
-> CTE Scan on accounts accounts_1 (cost=0.00..0.04 rows=1 width=32) (actual time=0.002..0.003 rows=0 loops=2)
Output: accounts_1.id
Filter: (accounts_1.id > $3)
Rows Removed by Filter: 2
-> Index Only Scan using journal_account_id_event_at_id_idx on public.journal j (cost=0.14..8.15 rows=1 width=68) (actual time=0.007..0.008 rows=1 loops=2)
Output: j.account_id, j.event_at, j.id
Index Cond: ((j.account_id = $4) AND (ROW(j.event_at, j.id) > ROW('2022-12-01'::text, 2)))
Heap Fetches: 2
-> Sort (cost=0.33..0.36 rows=11 width=68) (actual time=0.196..0.199 rows=2 loops=1)
Output: journal_entries.account_id, journal_entries.event_at, journal_entries.id
Sort Key: journal_entries.event_at, journal_entries.id
Sort Method: quicksort Memory: 25kB
-> CTE Scan on journal_entries (cost=0.00..0.22 rows=11 width=68) (actual time=0.091..0.175 rows=4 loops=1)
Output: journal_entries.account_id, journal_entries.event_at, journal_entries.id
Planning Time: 0.506 ms
Execution Time: 0.422 ms
(52 rows)
型
正如我们所看到的,postgres仍然对日记条目进行排序,而不是合并它们,但是与查询#2相比,日记条目更少-我们只为每个帐户读取limit条目(请参阅CTE Scan on journal_entries rows=4)。
1条答案
按热度按时间wz1wpwve1#
索引扫描只有在
account_id
与=
进行比较时才有效。由于PostgreSQL没有索引“跳过扫描”,因此您必须像以前一样重写查询。并不是所有可能的都在PostgreSQL优化器中实现。