为什么PostgreSQL使用顺序扫描而不是索引扫描？

e0uiprwp 于 2023-05-28 发布在 PostgreSQL

关注(0)|答案(2)|浏览(190)

我正在学习PostgreSQL，我试图了解如何选择如何扫描这个数据库的细节。
我有postgtresql 14.2并运行以下代码：

DROP TABLE IF EXISTS t_test;
CREATE TABLE t_test (id serial, name text);
INSERT INTO t_test (name) SELECT 'hans' FROM generate_series(1, 2000000);
INSERT INTO t_test (name) SELECT 'paul' FROM generate_series(1, 2000000);
CREATE INDEX idx_id ON t_test (id);
EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 LIMIT 2;

我看到：

QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.11 rows=2 width=36) (actual time=152.718..152.728 rows=2 loops=1)
   ->  Seq Scan on t_test  (cost=0.00..71622.00 rows=1333333 width=36) (actual time=152.713..152.716 rows=2 loops=1)
         Filter: (id > 2756021)
         Rows Removed by Filter: 2756021
 Planning Time: 0.218 ms
 Execution Time: 152.756 ms
(6 rows)

根据该查询计划，在seq扫描的情况下，PG需要读取2 756 021行，直到PG到达第一个期望的行。据我所知，在索引扫描的情况下，PG只需要从索引（ID = 2756021）中找到1行并接收下一个索引值（它应该是轻操作，因为索引是排序列表）并加载与这些索引相关的行两次。但PG Planner选择了Seq Scan。
我关闭了seq扫描，得到的执行时间是0.116毫秒，而不是第一种情况下的152.756毫秒：

DROP TABLE IF EXISTS t_test;
CREATE TABLE t_test (id serial, name text);
INSERT INTO t_test (name) SELECT 'hans' FROM generate_series(1, 2000000);
INSERT INTO t_test (name) SELECT 'paul' FROM generate_series(1, 2000000);
CREATE INDEX idx_id ON t_test (id);
SET enable_seqscan = FALSE;
EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 LIMIT 2;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..0.62 rows=2 width=36) (actual time=0.060..0.072 rows=2 loops=1)
   ->  Index Scan using idx_id on t_test  (cost=0.43..124449.76 rows=1333333 width=36) (actual time=0.055..0.059 rows=2 loops=1)
         Index Cond: (id > 2756021)
 Planning Time: 0.342 ms
 Execution Time: 0.116 ms
(5 rows)

但说实话，以下使用seq scan的查询比索引扫描有更好的结果（但我不明白为什么）：

SET enable_seqscan = TRUE ;
EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 LIMIT 2;

postgres=# EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 LIMIT 2;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.12 rows=2 width=9) (actual time=0.042..0.054 rows=2 loops=1)
   ->  Seq Scan on t_test  (cost=0.00..71622.00 rows=1242848 width=9) (actual time=0.038..0.041 rows=2 loops=1)
         Filter: (id > 2756021)
         Rows Removed by Filter: 261
 Planning Time: 0.076 ms
 Execution Time: 0.084 ms
(6 rows)

postgres=# EXPLAIN ANALYZE SELECT * from t_test where id > 1758121 LIMIT 2;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.06 rows=2 width=9) (actual time=0.018..0.032 rows=2 loops=1)
   ->  Seq Scan on t_test  (cost=0.00..71622.00 rows=2233890 width=9) (actual time=0.014..0.018 rows=2 loops=1)
         Filter: (id > 1758121)
 Planning Time: 0.092 ms
 Execution Time: 0.071 ms
(5 rows)

为什么PG使用Seq Scan而不是Index Scan？

postgresql

来源：https://stackoverflow.com/questions/76303743/why-does-postgresql-use-sequential-scan-instead-of-index-scan

2条答案

按热度按时间

x9ybnkn61#

第一个解释计划的明显问题在第二行

->  Seq Scan on t_test  (cost=0.00..71622.00 rows=1333333 width=36) (actual time=152.713..152.716 rows=2 loops=1)

PostgreSQL优化器估计 predicate id > 2756021将返回130万行（rows=1333333），因此在估计中不考虑LIMIT 2，而选择Seq Scan。
如果你用一个 upper limit来帮助PostgreSQL，那么使用索引就可以了。

示例 *

EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 and id <= 2756021+2 LIMIT 2;

Limit  (cost=0.43..8.47 rows=2 width=9) (actual time=0.041..0.043 rows=2 loops=1)
  ->  Index Scan using idx_id on t_test  (cost=0.43..8.47 rows=2 width=9) (actual time=0.040..0.041 rows=2 loops=1)
        Index Cond: ((id > 2756021) AND (id <= 2756023))
Planning Time: 0.130 ms
Execution Time: 0.066 ms

赞(0）回复(0）举报 2023-05-28

t98cgbkg2#

我相信planner不使用索引扫描，因为你从表中请求了大量的记录（id > 2756201意味着“大约1/3的表”）。Planner决定索引扫描（以及之后通过索引访问1/3的行）将比阅读所有表慢。要进行确认，请尝试在where子句中为id使用更大的值：id > 3256201、id > 3756201等...

赞(0）回复(0）举报 2023-05-28

我来回答

为什么PostgreSQL使用顺序扫描而不是索引扫描？

2条答案

相关问题

热门标签

最新问答