为什么PostgreSQL使用顺序扫描而不是索引扫描?

e0uiprwp  于 2023-05-28  发布在  PostgreSQL
关注(0)|答案(2)|浏览(190)

我正在学习PostgreSQL,我试图了解如何选择如何扫描这个数据库的细节。
我有postgtresql 14.2并运行以下代码:

DROP TABLE IF EXISTS t_test;
CREATE TABLE t_test (id serial, name text);
INSERT INTO t_test (name) SELECT 'hans' FROM generate_series(1, 2000000);
INSERT INTO t_test (name) SELECT 'paul' FROM generate_series(1, 2000000);
CREATE INDEX idx_id ON t_test (id);
EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 LIMIT 2;

我看到:

QUERY PLAN                                                      
---------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.11 rows=2 width=36) (actual time=152.718..152.728 rows=2 loops=1)
   ->  Seq Scan on t_test  (cost=0.00..71622.00 rows=1333333 width=36) (actual time=152.713..152.716 rows=2 loops=1)
         Filter: (id > 2756021)
         Rows Removed by Filter: 2756021
 Planning Time: 0.218 ms
 Execution Time: 152.756 ms
(6 rows)

根据该查询计划,在seq扫描的情况下,PG需要读取2 756 021行,直到PG到达第一个期望的行。据我所知,在索引扫描的情况下,PG只需要从索引(ID = 2756021)中找到1行并接收下一个索引值(它应该是轻操作,因为索引是排序列表)并加载与这些索引相关的行两次。但PG Planner选择了Seq Scan。
我关闭了seq扫描,得到的执行时间是0.116毫秒,而不是第一种情况下的152.756毫秒:

DROP TABLE IF EXISTS t_test;
CREATE TABLE t_test (id serial, name text);
INSERT INTO t_test (name) SELECT 'hans' FROM generate_series(1, 2000000);
INSERT INTO t_test (name) SELECT 'paul' FROM generate_series(1, 2000000);
CREATE INDEX idx_id ON t_test (id);
SET enable_seqscan = FALSE;
EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 LIMIT 2;
                                                           QUERY PLAN                                                            
---------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.43..0.62 rows=2 width=36) (actual time=0.060..0.072 rows=2 loops=1)
   ->  Index Scan using idx_id on t_test  (cost=0.43..124449.76 rows=1333333 width=36) (actual time=0.055..0.059 rows=2 loops=1)
         Index Cond: (id > 2756021)
 Planning Time: 0.342 ms
 Execution Time: 0.116 ms
(5 rows)

但说实话,以下使用seq scan的查询比索引扫描有更好的结果(但我不明白为什么):

SET enable_seqscan = TRUE ;
EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 LIMIT 2;

postgres=# EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 LIMIT 2;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.12 rows=2 width=9) (actual time=0.042..0.054 rows=2 loops=1)
   ->  Seq Scan on t_test  (cost=0.00..71622.00 rows=1242848 width=9) (actual time=0.038..0.041 rows=2 loops=1)
         Filter: (id > 2756021)
         Rows Removed by Filter: 261
 Planning Time: 0.076 ms
 Execution Time: 0.084 ms
(6 rows)

postgres=# EXPLAIN ANALYZE SELECT * from t_test where id > 1758121 LIMIT 2;
                                                   QUERY PLAN                                                   
----------------------------------------------------------------------------------------------------------------
 Limit  (cost=0.00..0.06 rows=2 width=9) (actual time=0.018..0.032 rows=2 loops=1)
   ->  Seq Scan on t_test  (cost=0.00..71622.00 rows=2233890 width=9) (actual time=0.014..0.018 rows=2 loops=1)
         Filter: (id > 1758121)
 Planning Time: 0.092 ms
 Execution Time: 0.071 ms
(5 rows)

为什么PG使用Seq Scan而不是Index Scan?

x9ybnkn6

x9ybnkn61#

第一个解释计划的明显问题在第二行

->  Seq Scan on t_test  (cost=0.00..71622.00 rows=1333333 width=36) (actual time=152.713..152.716 rows=2 loops=1)

PostgreSQL优化器估计 predicate id > 2756021将返回130万行(rows=1333333),因此在估计中不考虑LIMIT 2,而选择Seq Scan
如果你用一个 upper limit来帮助PostgreSQL,那么使用索引就可以了。

  • 示例 *
EXPLAIN ANALYZE SELECT * from t_test where id > 2756021 and id <= 2756021+2 LIMIT 2;

Limit  (cost=0.43..8.47 rows=2 width=9) (actual time=0.041..0.043 rows=2 loops=1)
  ->  Index Scan using idx_id on t_test  (cost=0.43..8.47 rows=2 width=9) (actual time=0.040..0.041 rows=2 loops=1)
        Index Cond: ((id > 2756021) AND (id <= 2756023))
Planning Time: 0.130 ms
Execution Time: 0.066 ms
t98cgbkg

t98cgbkg2#

我相信planner不使用索引扫描,因为你从表中请求了大量的记录(id > 2756201意味着“大约1/3的表”)。Planner决定索引扫描(以及之后通过索引访问1/3的行)将比阅读所有表慢。要进行确认,请尝试在where子句中为id使用更大的值:id > 3256201id > 3756201等...

相关问题