postgresql 按复合键分页

wmvff8tz  于 2023-11-18  发布在  PostgreSQL
关注(0)|答案(1)|浏览(142)

我试图优化我的查询,它运行在PostgreSQL版本13或14。
假设我们有一个带有相应索引的表:

CREATE TABLE journal 
(
    account_id TEXT NOT NULL,
    event_at TEXT NOT NULL,
    id INTEGER PRIMARY KEY
);

CREATE INDEX journal_account_id_event_at_id_idx 
    ON journal (account_id, event_at, id);

INSERT INTO journal VALUES ('A', '2023-01-01', 1);
INSERT INTO journal VALUES ('A', '2023-01-10', 50);
INSERT INTO journal VALUES ('A', '2023-01-30', 15);
INSERT INTO journal VALUES ('A', '2023-03-02', 28);
INSERT INTO journal VALUES ('A', '2023-03-05', 16);
INSERT INTO journal VALUES ('B', '2023-01-01', 101);
INSERT INTO journal VALUES ('B', '2023-01-01', 102);
INSERT INTO journal VALUES ('B', '2023-01-01', 103);
INSERT INTO journal VALUES ('C', '2022-12-01', 2);
INSERT INTO journal VALUES ('C', '2023-01-02', 10);
INSERT INTO journal VALUES ('C', '2023-01-30', 6);
INSERT INTO journal VALUES ('C', '2023-01-30', 20);
INSERT INTO journal VALUES ('C', '2023-02-02', 29);
INSERT INTO journal VALUES ('C', '2023-03-03', 31);

字符串
我需要通过分页从上述表格中进行选择。如果我们知道我们需要选择日记账条目的帐户,那么我们可以这样做:
问题1:

SELECT *
FROM journal
WHERE
    account_id = 'C' -- account condition
    AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, id
LIMIT 2;  -- page size for pagination

                                                                          QUERY PLAN                                                                           
---------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.14..8.15 rows=1 width=68) (actual time=0.029..0.036 rows=2 loops=1)
   Output: account_id, event_at, id
   ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal  (cost=0.14..8.15 rows=1 width=68) (actual time=0.026..0.029 rows=2 loops=1)
         Output: account_id, event_at, id
         Index Cond: ((journal.account_id = 'C'::text) AND (ROW(journal.event_at, journal.id) > ROW('2022-12-01'::text, 2)))
         Heap Fetches: 2
 Planning Time: 0.204 ms
 Execution Time: 0.085 ms
(8 rows)


这工作得很好,postgres使用索引,正如你从查询计划中看到的,根据limit,只读取了2个日记条目(请参阅“仅索引扫描”节点的行数=2)。然而,如果我们需要为多个账户选择日记条目,并且我们事先不知道确切的账户,postgres似乎选择了所有条目并对其进行排序,以便适当地限制输出:
问题2:

WITH accounts AS 
(
    -- let's pretend here is some select from 'accounts' table
    SELECT *
    FROM unnest(ARRAY['A', 'C']::TEXT[]) AS account(id)
)
SELECT j.*
FROM journal j JOIN accounts a ON a.id = j.account_id -- account condition
WHERE (event_at, j.id) > ('2022-12-01', 2) -- pagination condition
ORDER BY event_at, j.id
LIMIT 2;  -- page size for pagination
                                                                                 QUERY PLAN                                                                                  
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=12.36..12.37 rows=1 width=68) (actual time=0.120..0.134 rows=2 loops=1)
   Output: j.account_id, j.event_at, j.id
   ->  Sort  (cost=12.36..12.37 rows=1 width=68) (actual time=0.117..0.125 rows=2 loops=1)
         Output: j.account_id, j.event_at, j.id
         Sort Key: j.event_at, j.id
         Sort Method: top-N heapsort  Memory: 25kB
         ->  Nested Loop  (cost=0.14..12.35 rows=1 width=68) (actual time=0.049..0.091 rows=10 loops=1)
               Output: j.account_id, j.event_at, j.id
               ->  Function Scan on pg_catalog.unnest account  (cost=0.00..0.02 rows=2 width=32) (actual time=0.008..0.011 rows=2 loops=1)
                     Output: account.id
                     Function Call: unnest('{A,C}'::text[])
               ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal j  (cost=0.14..6.15 rows=1 width=68) (actual time=0.020..0.025 rows=5 loops=2)
                     Output: j.account_id, j.event_at, j.id
                     Index Cond: ((j.account_id = account.id) AND (ROW(j.event_at, j.id) > ROW('2022-12-01'::text, 2)))
                     Heap Fetches: 10
 Planning Time: 0.307 ms
 Execution Time: 0.188 ms
(17 rows)


我们现在从查询计划中看到,'Nested loop'节点的行数=10,这意味着postgres读取所有满足分页条件的条目,只是为了稍后在'Limit'节点中省略它们。(account_id,event_at,id),postgres可以从“accounts”CTE中读取每个帐户的limit日志条目,然后合并它们而不进行排序,因为条目已经按(event_at,id)排序,本质上是这样做的:
问题3:

SELECT *
FROM (
  SELECT *
  FROM journal
  WHERE (account_id = 'A')
    AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
  ORDER BY event_at, id
  LIMIT 2
) j1
UNION ALL
SELECT *
FROM (
  SELECT *
  FROM journal
  WHERE (account_id = 'C')
    AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
  ORDER BY event_at, id
  LIMIT 2
) j2
ORDER BY event_at, id
LIMIT 2;
                                                                                     QUERY PLAN                                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.28..16.36 rows=2 width=68) (actual time=0.053..0.070 rows=2 loops=1)
   Output: journal.account_id, journal.event_at, journal.id
   ->  Merge Append  (cost=0.28..16.36 rows=2 width=68) (actual time=0.050..0.063 rows=2 loops=1)
         Sort Key: journal.event_at, journal.id
         ->  Limit  (cost=0.14..8.15 rows=1 width=68) (actual time=0.033..0.038 rows=2 loops=1)
               Output: journal.account_id, journal.event_at, journal.id
               ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal  (cost=0.14..8.15 rows=1 width=68) (actual time=0.031..0.033 rows=2 loops=1)
                     Output: journal.account_id, journal.event_at, journal.id
                     Index Cond: ((journal.account_id = 'A'::text) AND (ROW(journal.event_at, journal.id) > ROW('2022-12-01'::text, 2)))
                     Heap Fetches: 2
         ->  Limit  (cost=0.14..8.15 rows=1 width=68) (actual time=0.014..0.016 rows=1 loops=1)
               Output: journal_1.account_id, journal_1.event_at, journal_1.id
               ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal journal_1  (cost=0.14..8.15 rows=1 width=68) (actual time=0.012..0.013 rows=1 loops=1)
                     Output: journal_1.account_id, journal_1.event_at, journal_1.id
                     Index Cond: ((journal_1.account_id = 'C'::text) AND (ROW(journal_1.event_at, journal_1.id) > ROW('2022-12-01'::text, 2)))
                     Heap Fetches: 1
 Planning Time: 0.362 ms
 Execution Time: 0.132 ms
(18 rows)


请帮助我理解如何编写查询#2(针对多个帐户),以便它具有查询#3(利用已经排序的数据,读取比查询#2更少的数据)的查询计划。

编辑#1

感谢mentioning "skip scan"Laurenz Albe。我能够找到一个可能的解决方案-使用递归CTE来模拟跳过扫描。我尝试将此方法用于我的数据,并得出了以下结果:
问题4:

EXPLAIN (ANALYZE, VERBOSE)
WITH RECURSIVE accounts AS (
  -- lets pretent here is some select from 'accounts' table
  SELECT *
  FROM unnest(ARRAY['A', 'C']::TEXT[]) AS account(id)
  ORDER BY id
), journal_entries AS (
    (
      SELECT *
      FROM journal
      WHERE
        account_id = (SELECT MIN(id) FROM accounts)
        AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
      ORDER BY event_at, id
      LIMIT 2  -- page size for pagination
    )
    UNION ALL
    SELECT chunk.*
    FROM (
      SELECT account_id FROM journal_entries LIMIT 1
    ) prev CROSS JOIN LATERAL (
      SELECT *
      FROM journal j
      WHERE
        j.account_id = (
          SELECT MIN(id)
          FROM accounts
          WHERE id > prev.account_id
          LIMIT 1
        )
        AND (event_at, id) > ('2022-12-01', 2) -- pagination condition
      ORDER BY event_at, id
      LIMIT 2 -- page size for pagination
    ) chunk
)
SELECT *
FROM journal_entries
ORDER BY event_at, id
LIMIT 2; -- page size for pagination;
                                                                                     QUERY PLAN                                                                                      
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=91.42..91.43 rows=2 width=68) (actual time=0.198..0.230 rows=2 loops=1)
   Output: journal_entries.account_id, journal_entries.event_at, journal_entries.id
   CTE accounts
     ->  Sort  (cost=0.03..0.04 rows=2 width=32) (actual time=0.028..0.032 rows=2 loops=1)
           Output: account.id
           Sort Key: account.id
           Sort Method: quicksort  Memory: 25kB
           ->  Function Scan on pg_catalog.unnest account  (cost=0.00..0.02 rows=2 width=32) (actual time=0.009..0.011 rows=2 loops=1)
                 Output: account.id
                 Function Call: unnest('{A,C}'::text[])
   CTE journal_entries
     ->  Recursive Union  (cost=0.19..91.05 rows=11 width=68) (actual time=0.082..0.179 rows=4 loops=1)
           ->  Limit  (cost=0.19..8.21 rows=1 width=68) (actual time=0.080..0.090 rows=2 loops=1)
                 Output: journal.account_id, journal.event_at, journal.id
                 InitPlan 2 (returns $2)
                   ->  Aggregate  (cost=0.04..0.06 rows=1 width=32) (actual time=0.044..0.047 rows=1 loops=1)
                         Output: min(accounts.id)
                         ->  CTE Scan on accounts  (cost=0.00..0.04 rows=2 width=32) (actual time=0.030..0.036 rows=2 loops=1)
                               Output: accounts.id
                 ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal  (cost=0.14..8.15 rows=1 width=68) (actual time=0.078..0.080 rows=2 loops=1)
                       Output: journal.account_id, journal.event_at, journal.id
                       Index Cond: ((journal.account_id = $2) AND (ROW(journal.event_at, journal.id) > ROW('2022-12-01'::text, 2)))
                       Heap Fetches: 2
           ->  Nested Loop  (cost=0.19..8.26 rows=1 width=68) (actual time=0.027..0.037 rows=1 loops=2)
                 Output: j.account_id, j.event_at, j.id
                 ->  Limit  (cost=0.00..0.02 rows=1 width=32) (actual time=0.004..0.006 rows=1 loops=2)
                       Output: journal_entries_1.account_id
                       ->  WorkTable Scan on journal_entries journal_entries_1  (cost=0.00..0.20 rows=10 width=32) (actual time=0.002..0.002 rows=1 loops=2)
                             Output: journal_entries_1.account_id
                 ->  Limit  (cost=0.19..8.21 rows=1 width=68) (actual time=0.019..0.025 rows=1 loops=2)
                       Output: j.account_id, j.event_at, j.id
                       InitPlan 3 (returns $4)
                         ->  Limit  (cost=0.05..0.06 rows=1 width=32) (actual time=0.008..0.010 rows=1 loops=2)
                               Output: (min(accounts_1.id))
                               ->  Aggregate  (cost=0.05..0.06 rows=1 width=32) (actual time=0.005..0.007 rows=1 loops=2)
                                     Output: min(accounts_1.id)
                                     ->  CTE Scan on accounts accounts_1  (cost=0.00..0.04 rows=1 width=32) (actual time=0.002..0.003 rows=0 loops=2)
                                           Output: accounts_1.id
                                           Filter: (accounts_1.id > $3)
                                           Rows Removed by Filter: 2
                       ->  Index Only Scan using journal_account_id_event_at_id_idx on public.journal j  (cost=0.14..8.15 rows=1 width=68) (actual time=0.007..0.008 rows=1 loops=2)
                             Output: j.account_id, j.event_at, j.id
                             Index Cond: ((j.account_id = $4) AND (ROW(j.event_at, j.id) > ROW('2022-12-01'::text, 2)))
                             Heap Fetches: 2
   ->  Sort  (cost=0.33..0.36 rows=11 width=68) (actual time=0.196..0.199 rows=2 loops=1)
         Output: journal_entries.account_id, journal_entries.event_at, journal_entries.id
         Sort Key: journal_entries.event_at, journal_entries.id
         Sort Method: quicksort  Memory: 25kB
         ->  CTE Scan on journal_entries  (cost=0.00..0.22 rows=11 width=68) (actual time=0.091..0.175 rows=4 loops=1)
               Output: journal_entries.account_id, journal_entries.event_at, journal_entries.id
 Planning Time: 0.506 ms
 Execution Time: 0.422 ms
(52 rows)


正如我们所看到的,postgres仍然对日记条目进行排序,而不是合并它们,但是与查询#2相比,日记条目更少-我们只为每个帐户读取limit条目(请参阅CTE Scan on journal_entries rows=4)。

wz1wpwve

wz1wpwve1#

索引扫描只有在account_id=进行比较时才有效。由于PostgreSQL没有索引“跳过扫描”,因此您必须像以前一样重写查询。
并不是所有可能的都在PostgreSQL优化器中实现。

相关问题