PostgreSQL -为什么这个索引扫描比索引扫描慢？

7cwmlq89 于 2023-04-20 发布在 PostgreSQL

关注(0)|答案(2)|浏览(172)

我们有一个表foo_tbl（名称obscated，相同的数据类型和DDL）：

CREATE TABLE public.foo_tbl (
    id int8 NOT NULL,
    foo_id varchar(11) NOT NULL,
    foo_date timestamptz NULL,
    -- ... other unrelated columns ...

    CONSTRAINT pk_footbl PRIMARY KEY (id)
);
CREATE INDEX idx_1_2cols ON public.foo_tbl USING btree (foo_date, foo_id); -- initial index
CREATE INDEX idx_2_1col ON public.foo_tbl USING btree (foo_id); -- added later, when the query is slow

我们有一个很大的查询，它使用foo_id连接7个表和这个表，并得到foo_date。示例（真实的查询要大得多）：

select b.bar_code, f.foo_date from bar_tbl b join foo_tbl f on b.bar_id = f.foo_id limit 100;

如果没有使用foo_tbl的连接，查询速度很快（〈2s）。
使用foo_tbl添加连接后，尽管使用索引idx_1_2cols在foo_tbl上运行“仅索引扫描”（查询中仅使用此表的这2列），但查询速度要慢得多（〉15秒）。这是表的EXPLAIN ANALYZE结果：

{
  "Node Type": "Index Only Scan",
  "Parent Relationship": "Inner",
  "Parallel Aware": false,
  "Scan Direction": "Forward",
  "Index Name": "idx_1_2cols",
  "Relation Name": "foo_tbl",
  "Schema": "public",
  "Alias": "f",
  "Startup Cost": 0.42,
  "Total Cost": 2886.11,
  "Plan Rows": 1,
  "Plan Width": 20,
  "Actual Startup Time": 12.843,
  "Actual Total Time": 13.068,
  "Actual Rows": 1,
  "Actual Loops": 1200,
  "Output": ["f.foo_date", "f.foo_id"],
  "Index Cond": "(f.foo_id = (b.bar_id)::text)",
  "Rows Removed by Index Recheck": 0,
  "Heap Fetches": 0,
  "Shared Hit Blocks": 2284772,
  "Shared Read Blocks": 0,
  "Shared Dirtied Blocks": 0,
  "Shared Written Blocks": 0,
  "Local Hit Blocks": 0,
  "Local Read Blocks": 0,
  "Local Dirtied Blocks": 0,
  "Local Written Blocks": 0,
  "Temp Read Blocks": 0,
  "Temp Written Blocks": 0,
  "I/O Read Time": 0.0,
  "I/O Write Time": 0.0
}

为了研究，我们创建了单列索引idx_2_1col，查询再次快速（〈3s）。当EXPLAIN时，计划器选择新索引而不是旧索引进行“索引扫描”：

{
  "Node Type": "Index Scan",
  "Parent Relationship": "Inner",
  "Parallel Aware": false,
  "Scan Direction": "Forward",
  "Index Name": "idx_2_1col",
  "Relation Name": "foo_tbl",
  "Schema": "public",
  "Alias": "f",
  "Startup Cost": 0.42,
  "Total Cost": 0.46,
  "Plan Rows": 1,
  "Plan Width": 20,
  "Actual Startup Time": 0.007,
  "Actual Total Time": 0.007,
  "Actual Rows": 1,
  "Actual Loops": 1200,
  "Output": ["f.foo_date", "f.foo_id"],
  "Index Cond": "((f.foo_id)::text = (b.bar_id)::text)",
  "Rows Removed by Index Recheck": 0,
  "Shared Hit Blocks": 4800,
  "Shared Read Blocks": 0,
  "Shared Dirtied Blocks": 0,
  "Shared Written Blocks": 0,
  "Local Hit Blocks": 0,
  "Local Read Blocks": 0,
  "Local Dirtied Blocks": 0,
  "Local Written Blocks": 0,
  "Temp Read Blocks": 0,
  "Temp Written Blocks": 0,
  "I/O Read Time": 0.0,
  "I/O Write Time": 0.0
}

那么，为什么在这种情况下索引扫描比只索引扫描快？为什么只索引扫描这么慢？
备注：

已在EXPLAIN ANALYZE查询之前VACUUM ANALYZE
foo_tbl不是最大的，只有几十万条记录，连接中的一些表包含数百万条记录。
DBS是Amazon Aurora PostgreSQL兼容13.5（非无服务器）

postgresql

来源：https://stackoverflow.com/questions/76008226/postgresql-why-is-this-index-only-scan-slower-than-the-index-scan

2条答案

按热度按时间

gab6jxml1#

多列索引中最左边的列是应该查询的列。在您的示例中，只返回foo_date，并且只对第二列foo_id进行值检查。
doc很清楚这一点，甚至指出在这种情况下，整个索引将被扫描，而规划器很可能会扫描整个表。
多列B树索引可以与涉及索引列的任何子集的查询条件一起使用，但是当索引的前确切的规则是前导列上的等式约束，加上第一列上不具有等式约束的任何不等式约束，将用于限制要扫描的索引部分。在索引中检查对这些列右侧的列的约束，因此它们可以保存对表本身的访问，但不会减少必须扫描的索引部分。
您可以尝试切换索引中的列，或者通过在第二个索引中包含日期来创建covering index，以避免触及表。

赞(0）回复(0）举报 2023-04-20

c3frrgcw2#

由于尚未提供bar_tbl的DDL，查询规划器显然选择使用idx_1_2cols，因为它同时具有所需的两个列，并且规划器估计使用索引比扫描基表更有效（最肯定的是，因为阅读表将需要更多的块读取，因为不需要的列）。问题是，连接是在foo_id上，但是索引前缀是foo_date。将索引顺序更改为（foo_id，foo_date），查询将运行得更快。
添加idx_2_1col提高了性能，因为bar_id和foo_id之间的连接可以使用索引高效地进行，即使还需要从基表读取数据来满足查询。

赞(0）回复(0）举报 2023-04-20

我来回答

PostgreSQL -为什么这个索引扫描比索引扫描慢？

2条答案

相关问题

热门标签

最新问答