我一直在优化一些针对生产数据库克隆的sql查询,下面是一个示例查询,我创建了两个索引,我们可以使用哈希连接快速运行仅索引扫描。
explain analyse
select activity.id from activity, notification
where notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
and notification.received = false
and notification.invalid = false
and activity.id = notification.activity_id
and activity.space_id = 'e12b42ac-4e54-476f-a4f5-7d6bdb1e61e2'
order by activity.end_time desc
limit 21;
Limit (cost=985.58..985.58 rows=1 width=24) (actual time=0.017..0.017 rows=0 loops=1)
-> Sort (cost=985.58..985.58 rows=1 width=24) (actual time=0.016..0.016 rows=0 loops=1)
Sort Key: activity.end_time DESC
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=649.76..985.57 rows=1 width=24) (actual time=0.010..0.010 rows=0 loops=1)
Hash Cond: (notification.activity_id = activity.id)
-> Index Only Scan using unreceived_notifications_index on notification (cost=0.42..334.62 rows=127 width=16) (actual time=0.009..0.009 rows=0 loops=1)
Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
Heap Fetches: 0
-> Hash (cost=634.00..634.00 rows=1227 width=24) (never executed)
-> Index Only Scan using space_activity_index on activity (cost=0.56..634.00 rows=1227 width=24) (never executed)
Index Cond: (space_id = 'e12b42ac-4e54-476f-a4f5-7d6bdb1e61e2'::uuid)
Heap Fetches: 0
Planning time: 0.299 ms
Execution time: 0.046 ms
这是索引。
create index unreceived_notifications_index on notification using btree (
user_id,
activity_id, -- index-only scan
id -- index-only scan
) where (
invalid = false
and received = false
);
space_activity_index
create index space_activity_index on activity using btree (
space_id,
end_time desc,
id -- index-only scan
);
然而,我注意到这些索引使我们的开发数据库变慢了很多。下面是针对我们的开发数据库中的用户的相同查询,您会注意到它这次使用了嵌套循环连接,循环的顺序确实效率低下。
explain analyse
select notification.id from notification, activity
where notification.user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'
and notification.received = false
and notification.invalid = false
and activity.id = notification.activity_id
and activity.space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'
order by activity.end_time desc
limit 20;
Limit (cost=0.69..272.04 rows=20 width=24) (actual time=277.255..277.255 rows=0 loops=1)
-> Nested Loop (cost=0.69..71487.55 rows=5269 width=24) (actual time=277.253..277.253 rows=0 loops=1)
-> Index Only Scan using space_activity_index on activity (cost=0.42..15600.36 rows=155594 width=24) (actual time=0.016..59.433 rows=155666 loops=1)
Index Cond: (space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'::uuid)
Heap Fetches: 38361
-> Index Only Scan using unreceived_notifications_index on notification (cost=0.27..0.35 rows=1 width=32) (actual time=0.001..0.001 rows=0 loops=155666)
Index Cond: ((user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid) AND (activity_id = activity.id))
Heap Fetches: 0
Planning time: 0.351 ms
Execution time: 277.286 ms
这里需要注意的一点是,在我们的开发数据库中只有两个space_id,我怀疑这是导致Postgres试图变聪明的原因,但实际上这会使性能变差!
我的问题是:
1.有没有什么方法可以强制Postgres运行散列连接而不是嵌套循环连接?
1.总的来说,有没有什么方法可以让Postgres的查询规划器更确定?理想情况下,这些环境之间的查询性能特征应该完全相同。
谢谢。
- 编辑:**注意,当我在查询我的dev数据库时省略space_id条件时,结果会更快。
explain analyse
select notification.id from notification, activity
where notification.user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'
and notification.received = false
and notification.invalid = false
and activity.id = notification.activity_id
--and activity.space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'
order by activity.end_time desc
limit 20;
Limit (cost=17628.13..17630.43 rows=20 width=24) (actual time=2.730..2.730 rows=0 loops=1)
-> Gather Merge (cost=17628.13..17996.01 rows=3199 width=24) (actual time=2.729..2.729 rows=0 loops=1)
Workers Planned: 1
Workers Launched: 1
-> Sort (cost=16628.12..16636.12 rows=3199 width=24) (actual time=0.126..0.126 rows=0 loops=2)
Sort Key: activity.end_time DESC
Sort Method: quicksort Memory: 25kB
-> Nested Loop (cost=20.59..16441.88 rows=3199 width=24) (actual time=0.093..0.093 rows=0 loops=2)
-> Parallel Bitmap Heap Scan on notification (cost=20.17..2512.17 rows=3199 width=32) (actual time=0.092..0.092 rows=0 loops=2)
Recheck Cond: ((user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid) AND (NOT invalid) AND (NOT received))
-> Bitmap Index Scan on unreceived_notifications_index (cost=0.00..18.82 rows=5439 width=0) (actual time=0.006..0.006 rows=0 loops=1)
Index Cond: (user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid)
-> Index Scan using activity_pkey on activity (cost=0.42..4.35 rows=1 width=24) (never executed)
Index Cond: (id = notification.activity_id)
Planning time: 0.344 ms
Execution time: 3.433 ms
- 编辑:**在reading about index hinting之后,我尝试使用
set enable_nestloop=false;
关闭nested_loop,查询速度快多了!
- 编辑:**在reading about index hinting之后,我尝试使用
Limit (cost=20617.76..20620.09 rows=20 width=24) (actual time=2.872..2.872 rows=0 loops=1)
-> Gather Merge (cost=20617.76..21130.20 rows=4392 width=24) (actual time=2.871..2.871 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=19617.74..19623.23 rows=2196 width=24) (actual time=0.086..0.086 rows=0 loops=3)
Sort Key: activity.end_time DESC
Sort Method: quicksort Memory: 25kB
-> Hash Join (cost=2609.20..19495.85 rows=2196 width=24) (actual time=0.062..0.062 rows=0 loops=3)
Hash Cond: (activity.id = notification.activity_id)
-> Parallel Seq Scan on activity (cost=0.00..14514.57 rows=64831 width=24) (actual time=0.006..0.006 rows=1 loops=3)
Filter: (space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'::uuid)
-> Hash (cost=2541.19..2541.19 rows=5441 width=32) (actual time=0.007..0.007 rows=0 loops=3)
Buckets: 8192 Batches: 1 Memory Usage: 64kB
-> Bitmap Heap Scan on notification (cost=20.18..2541.19 rows=5441 width=32) (actual time=0.006..0.006 rows=0 loops=3)
Recheck Cond: ((user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid) AND (NOT invalid) AND (NOT received))
-> Bitmap Index Scan on unreceived_notifications_index (cost=0.00..18.82 rows=5441 width=0) (actual time=0.004..0.004 rows=0 loops=3)
Index Cond: (user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid)
Planning time: 0.375 ms
Execution time: 3.630 ms
1条答案
按热度按时间nhaq1z211#
这取决于你想要达到的专业化程度。postgresQL中有一些计划指南,你可以用来强制查询使用特定的索引。但是查询优化器在他们所做的选择中会受到记录数的强烈影响。也许你可以在非开发环境中添加额外的索引,然后继续前进。
https://docs.aws.amazon.com/dms/latest/sql-server-to-aurora-postgresql-migration-playbook/chap-sql-server-aurora-pg.tuning.queryplanning.html