postgresql Postgres运行较慢的嵌套连接循环，而不是散列连接

x33g5p2x 于 2023-01-30 发布在 PostgreSQL

关注(0)|答案(1)|浏览(215)

我一直在优化一些针对生产数据库克隆的sql查询，下面是一个示例查询，我创建了两个索引，我们可以使用哈希连接快速运行仅索引扫描。

explain analyse
select activity.id from activity, notification
where notification.user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'
and notification.received = false
and notification.invalid = false
and activity.id = notification.activity_id
and activity.space_id = 'e12b42ac-4e54-476f-a4f5-7d6bdb1e61e2'
order by activity.end_time desc
limit 21;

Limit  (cost=985.58..985.58 rows=1 width=24) (actual time=0.017..0.017 rows=0 loops=1)
  ->  Sort  (cost=985.58..985.58 rows=1 width=24) (actual time=0.016..0.016 rows=0 loops=1)
        Sort Key: activity.end_time DESC
        Sort Method: quicksort  Memory: 25kB
        ->  Hash Join  (cost=649.76..985.57 rows=1 width=24) (actual time=0.010..0.010 rows=0 loops=1)
              Hash Cond: (notification.activity_id = activity.id)
              ->  Index Only Scan using unreceived_notifications_index on notification  (cost=0.42..334.62 rows=127 width=16) (actual time=0.009..0.009 rows=0 loops=1)
                    Index Cond: (user_id = '9a51f675-e1e2-46e5-8bcd-6bc535c7e7cb'::uuid)
                    Heap Fetches: 0
              ->  Hash  (cost=634.00..634.00 rows=1227 width=24) (never executed)
                    ->  Index Only Scan using space_activity_index on activity  (cost=0.56..634.00 rows=1227 width=24) (never executed)
                          Index Cond: (space_id = 'e12b42ac-4e54-476f-a4f5-7d6bdb1e61e2'::uuid)
                          Heap Fetches: 0
Planning time: 0.299 ms
Execution time: 0.046 ms

这是索引。

create index unreceived_notifications_index on notification using btree (
    user_id,
    activity_id, -- index-only scan
    id -- index-only scan
) where (
    invalid = false
    and received = false
);

space_activity_index
create index space_activity_index on activity using btree (
    space_id,
    end_time desc,
    id -- index-only scan
);

然而，我注意到这些索引使我们的开发数据库变慢了很多。下面是针对我们的开发数据库中的用户的相同查询，您会注意到它这次使用了嵌套循环连接，循环的顺序确实效率低下。

explain analyse
select notification.id from notification, activity
where notification.user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'
and notification.received = false
and notification.invalid = false
and activity.id = notification.activity_id
and activity.space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'
order by activity.end_time desc
limit 20;

Limit  (cost=0.69..272.04 rows=20 width=24) (actual time=277.255..277.255 rows=0 loops=1)
  ->  Nested Loop  (cost=0.69..71487.55 rows=5269 width=24) (actual time=277.253..277.253 rows=0 loops=1)
        ->  Index Only Scan using space_activity_index on activity  (cost=0.42..15600.36 rows=155594 width=24) (actual time=0.016..59.433 rows=155666 loops=1)
              Index Cond: (space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'::uuid)
              Heap Fetches: 38361
        ->  Index Only Scan using unreceived_notifications_index on notification  (cost=0.27..0.35 rows=1 width=32) (actual time=0.001..0.001 rows=0 loops=155666)
              Index Cond: ((user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid) AND (activity_id = activity.id))
              Heap Fetches: 0
Planning time: 0.351 ms
Execution time: 277.286 ms

这里需要注意的一点是，在我们的开发数据库中只有两个space_id，我怀疑这是导致Postgres试图变聪明的原因，但实际上这会使性能变差!
我的问题是：
1.有没有什么方法可以强制Postgres运行散列连接而不是嵌套循环连接？
1.总的来说，有没有什么方法可以让Postgres的查询规划器更确定？理想情况下，这些环境之间的查询性能特征应该完全相同。
谢谢。

- 编辑：**注意，当我在查询我的dev数据库时省略space_id条件时，结果会更快。

explain analyse
select notification.id from notification, activity
where notification.user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'
and notification.received = false
and notification.invalid = false
and activity.id = notification.activity_id
--and activity.space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'
order by activity.end_time desc
limit 20;

Limit  (cost=17628.13..17630.43 rows=20 width=24) (actual time=2.730..2.730 rows=0 loops=1)
  ->  Gather Merge  (cost=17628.13..17996.01 rows=3199 width=24) (actual time=2.729..2.729 rows=0 loops=1)
        Workers Planned: 1
        Workers Launched: 1
        ->  Sort  (cost=16628.12..16636.12 rows=3199 width=24) (actual time=0.126..0.126 rows=0 loops=2)
              Sort Key: activity.end_time DESC
              Sort Method: quicksort  Memory: 25kB
              ->  Nested Loop  (cost=20.59..16441.88 rows=3199 width=24) (actual time=0.093..0.093 rows=0 loops=2)
                    ->  Parallel Bitmap Heap Scan on notification  (cost=20.17..2512.17 rows=3199 width=32) (actual time=0.092..0.092 rows=0 loops=2)
                          Recheck Cond: ((user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid) AND (NOT invalid) AND (NOT received))
                          ->  Bitmap Index Scan on unreceived_notifications_index  (cost=0.00..18.82 rows=5439 width=0) (actual time=0.006..0.006 rows=0 loops=1)
                                Index Cond: (user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid)
                    ->  Index Scan using activity_pkey on activity  (cost=0.42..4.35 rows=1 width=24) (never executed)
                          Index Cond: (id = notification.activity_id)
Planning time: 0.344 ms
Execution time: 3.433 ms

- 编辑：**在reading about index hinting之后，我尝试使用set enable_nestloop=false;关闭nested_loop，查询速度快多了!

Limit  (cost=20617.76..20620.09 rows=20 width=24) (actual time=2.872..2.872 rows=0 loops=1)
  ->  Gather Merge  (cost=20617.76..21130.20 rows=4392 width=24) (actual time=2.871..2.871 rows=0 loops=1)
        Workers Planned: 2
        Workers Launched: 2
        ->  Sort  (cost=19617.74..19623.23 rows=2196 width=24) (actual time=0.086..0.086 rows=0 loops=3)
              Sort Key: activity.end_time DESC
              Sort Method: quicksort  Memory: 25kB
              ->  Hash Join  (cost=2609.20..19495.85 rows=2196 width=24) (actual time=0.062..0.062 rows=0 loops=3)
                    Hash Cond: (activity.id = notification.activity_id)
                    ->  Parallel Seq Scan on activity  (cost=0.00..14514.57 rows=64831 width=24) (actual time=0.006..0.006 rows=1 loops=3)
                          Filter: (space_id = '415fc269-e68f-4da0-b3e3-b1273b741a7f'::uuid)
                    ->  Hash  (cost=2541.19..2541.19 rows=5441 width=32) (actual time=0.007..0.007 rows=0 loops=3)
                          Buckets: 8192  Batches: 1  Memory Usage: 64kB
                          ->  Bitmap Heap Scan on notification  (cost=20.18..2541.19 rows=5441 width=32) (actual time=0.006..0.006 rows=0 loops=3)
                                Recheck Cond: ((user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid) AND (NOT invalid) AND (NOT received))
                                ->  Bitmap Index Scan on unreceived_notifications_index  (cost=0.00..18.82 rows=5441 width=0) (actual time=0.004..0.004 rows=0 loops=3)
                                      Index Cond: (user_id = '7c74a801-7cb5-4914-bbbe-2b18cd1ced76'::uuid)
Planning time: 0.375 ms
Execution time: 3.630 ms

postgresql

来源：https://stackoverflow.com/questions/51993758/postgres-running-a-slower-nested-join-loop-instead-of-a-hash-join

1条答案

按热度按时间

nhaq1z211#

这取决于你想要达到的专业化程度。postgresQL中有一些计划指南，你可以用来强制查询使用特定的索引。但是查询优化器在他们所做的选择中会受到记录数的强烈影响。也许你可以在非开发环境中添加额外的索引，然后继续前进。
https://docs.aws.amazon.com/dms/latest/sql-server-to-aurora-postgresql-migration-playbook/chap-sql-server-aurora-pg.tuning.queryplanning.html

赞(0）回复(0）举报 2023-01-30

我来回答

postgresql Postgres运行较慢的嵌套连接循环，而不是散列连接

1条答案

相关问题

热门标签

最新问答