我们在postgresql 13数据库中有一个重要的表“client”,它有大约500万行。此表链接到另一个具有2000行的表公司。我们有一家公司有200万行,三家公司有20万行,其余的公司都比其他公司低。
客户端的检索非常慢(谈论10- 20秒),我正试图优化它。
company_id列显然是有索引的,但是postgresql并不使用它,而是扫描整个表。
基本查询(13秒):
select distinct c1_0.id,c1_0.date_creation,c1_0.date_modification,c1_0.company_id,c1_0.nom,c1_0.prenom,c1_0.uuid
from client c1_0
join company e1_0 on e1_0.id=c1_0.company_id
where e1_0.comp_name in ('MYCOMPANY');
以下是解释计划:
Unique (cost=49805.26..50015.42 rows=2627 width=775) (actual time=12272.378..13245.548 rows=3246176 loops=1)
-> Sort (cost=49805.26..49811.82 rows=2627 width=775) (actual time=12272.376..12559.093 rows=3246176 loops=1)
Sort Key: c1_0.id,c1_0.date_creation,c1_0.date_modification,c1_0.company_id,c1_0.nom,c1_0.prenom,c1_0.uuid
Sort Method: external merge Disk: 615776kB
-> Nested Loop (cost=265.28..49656.05 rows=2627 width=775) (actual time=166.906..9478.238 rows=3246176 loops=1)
-> Index Scan using company_nom_key on company e1_0 (cost=0.28..8.30 rows=1 width=8) (actual time=1.102..1.105 rows=1 loops=1)
Index Cond: ((numero_licence)::text = 'MYCOMPANY'::text)
-> Bitmap Heap Scan on client c1_0 (cost=265.00..49476.70 rows=17106 width=775) (actual time=165.798..8588.485 rows=3246176 loops=1)
Recheck Cond: (company_id = e1_0.id)
Rows Removed by Index Recheck: 1180926
Heap Blocks: exact=35248 lossy=99166
-> Bitmap Index Scan on uc_client_company_id_carte_fidelite (cost=0.00..260.73 rows=17106 width=0) (actual time=161.176..161.176 rows=3246177 loops=1)
Index Cond: (company_id = e1_0.id)
Planning Time: 3.078 ms
Execution Time: 13354.253 ms
现在,如果我直接使用company_id执行相同的请求,它会更快(150- 300毫秒)
select distinct c1_0.id,c1_0.date_creation,c1_0.date_modification,c1_0.company_id,c1_0.nom,c1_0.prenom,c1_0.uuid
from client c1_0
join company e1_0 on e1_0.id=c1_0.company_id
where e1_0.id in (12345);
以下是解释计划:
Unique (cost=1001.17..1289653.17 rows=3248642 width=775) (actual time=188.930..6268.170 rows=3246176 loops=1)
-> Nested Loop (cost=1001.17..1046005.02 rows=3248642 width=775) (actual time=188.928..4811.113 rows=3246176 loops=1)
-> Gather Merge (cost=1000.89..1005388.70 rows=3248642 width=775) (actual time=188.910..4020.417 rows=3246176 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Incremental Sort (cost=0.86..629414.78 rows=1353601 width=775) (actual time=126.966..3203.508 rows=1082059 loops=3)
Sort Key: c1_0.id,c1_0.date_creation,c1_0.date_modification,c1_0.company_id,c1_0.nom,c1_0.prenom,c1_0.uuid
Presorted Key: c1_0.id
Full-sort Groups: 31567 Sort Method: quicksort Average Memory: 33kB Peak Memory: 33kB
Worker 0: Full-sort Groups: 34935 Sort Method: quicksort Average Memory: 33kB Peak Memory: 33kB
Worker 1: Full-sort Groups: 34943 Sort Method: quicksort Average Memory: 33kB Peak Memory: 33kB
-> Parallel Index Scan using pk_client on client c1_0 (cost=0.43..568502.73 rows=1353601 width=775) (actual time=117.699..2689.785 rows=1082059 loops=3)
Filter: (company_id = 12345)
Rows Removed by Filter: 856619
-> Materialize (cost=0.28..8.30 rows=1 width=8) (actual time=0.000..0.000 rows=1 loops=3246176)
-> Index Only Scan using pk_company on company e1_0 (cost=0.28..8.30 rows=1 width=8) (actual time=0.009..0.010 rows=1 loops=1)
Index Cond: (id = 206852)
Heap Fetches: 1
Planning Time: 0.173 ms
JIT:
Functions: 15
Options: Inlining true, Optimization true, Expressions true, Deforming true
Timing: Generation 5.137 ms, Inlining 59.856 ms, Optimization 167.256 ms, Emission 110.837 ms, Total 343.087 ms
Execution Time: 6349.227 ms
并不是所有的时候,我都能够执行第一个查询来检索公司id(或id),然后在客户端上使用公司id进行第二次选择(整个过程都在一个带有JPA的Java应用程序中)。删除distinct当然有帮助,但对于其他查询中的某些连接,这是强制性的。
我也试过对客户端表进行分区,基本上出现了同样的问题,当使用comp_name时,postgresql扫描所有分区,当直接使用company id时,postgresql选择正确的分区。
我创建了一个表client,它保留了id(用于全局unicity + fk)和另一个表client_partitioned,两者共享相同的id。
Limit (cost=197182.76..197197.96 rows=20 width=777) (actual time=7771.630..7831.954 rows=20 loops=1)
-> Unique (cost=197182.76..199179.52 rows=2627 width=777) (actual time=7745.582..7805.904 rows=20 loops=1)
-> Nested Loop (cost=197182.76..198975.93 rows=2627 width=777) (actual time=7745.580..7805.887 rows=20 loops=1)
-> Gather Merge (cost=197182.33..197488.29 rows=2627 width=777) (actual time=7745.531..7805.786 rows=20 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Sort (cost=196182.31..196185.04 rows=1095 width=777) (actual time=7253.342..7253.493 rows=217 loops=3)
Sort Key: c1_0.id,c1_0.date_creation,c1_0.date_modification,c1_0.company_id,c1_0.nom,c1_0.prenom,c1_0.uuid
Sort Method: external merge Disk: 178904kB
Worker 0: Sort Method: external merge Disk: 263768kB
Worker 1: Sort Method: external merge Disk: 173232kB
-> Hash Join (cost=8.31..196127.03 rows=1095 width=777) (actual time=718.867..3399.692 rows=1082059 loops=3)
Hash Cond: (c1_0.company_id = e1_0.id)
-> Parallel Append (cost=0.00..189745.25 rows=2423347 width=777) (actual time=13.854..3123.482 rows=1938678 loops=3)
-> Parallel Seq Scan on client_partitioned_mycompany c1_0_3 (cost=0.00..99961.73 rows=1352573 width=776) (actual time=6.855..2401.764 rows=1082059 loops=3)
-> Parallel Seq Scan on client_partitioned_default c1_0_6 (cost=0.00..33698.71 rows=465571 width=774) (actual time=20.721..783.996 rows=1117370 loops=1)
-> Parallel Seq Scan on client_partitioned_mycompany2 c1_0_4 (cost=0.00..20913.48 rows=296648 width=776) (actual time=0.147..309.226 rows=355978 loops=2)
-> Parallel Seq Scan on client_partitioned_mycompany3 c1_0_2 (cost=0.00..18261.16 rows=248316 width=774) (actual time=0.766..398.383 rows=595958 loops=1)
-> Parallel Seq Scan on client_partitioned_mycompany4 c1_0_5 (cost=0.00..3233.04 rows=59804 width=821) (actual time=0.346..71.407 rows=101666 loops=1)
-> Parallel Seq Scan on client_partitioned_mycompany5 c1_0_1 (cost=0.00..1560.40 rows=25240 width=840) (actual time=0.320..28.769 rows=42908 loops=1)
-> Hash (cost=8.30..8.30 rows=1 width=8) (actual time=0.587..0.588 rows=1 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 9kB
-> Index Scan using company_comp_name_key on company e1_0 (cost=0.28..8.30 rows=1 width=8) (actual time=0.516..0.517 rows=1 loops=3)
Index Cond: ((numero_licence)::text = 'MYCOMPANY'::text)
-> Index Only Scan using pk_client on client c1_1 (cost=0.43..0.57 rows=1 width=8) (actual time=0.003..0.003 rows=1 loops=20)
Index Cond: (id = c1_0.id)
Heap Fetches: 20
Planning Time: 0.892 ms
JIT:
Functions: 69
Options: Inlining false, Optimization false, Expressions true, Deforming true
Timing: Generation 10.378 ms, Inlining 0.000 ms, Optimization 5.020 ms, Emission 59.649 ms, Total 75.048 ms
Execution Time: 7851.928 ms
vs.
Limit (cost=7122.35..7246.57 rows=20 width=776) (actual time=29.735..37.386 rows=20 loops=1)
-> Unique (cost=7122.35..4039695.63 rows=649236 width=776) (actual time=29.734..37.384 rows=20 loops=1)
-> Nested Loop (cost=7122.35..3796232.43 rows=3246176 width=776) (actual time=29.732..37.368 rows=20 loops=1)
-> Nested Loop (cost=7122.07..3755646.93 rows=3246176 width=776) (actual time=29.713..37.341 rows=20 loops=1)
-> Gather Merge (cost=7121.64..1618527.96 rows=3246176 width=776) (actual time=29.677..37.248 rows=20 loops=1)
Workers Planned: 2
Workers Launched: 2
-> Incremental Sort (cost=6121.61..1242838.68 rows=1352573 width=776) (actual time=0.232..2.783 rows=72 loops=3)
Sort Key: c1_0.id,c1_0.date_creation,c1_0.date_modification,c1_0.company_id,c1_0.nom,c1_0.prenom,c1_0.uuid
Presorted Key: c1_0.nom
Full-sort Groups: 1 Sort Method: quicksort Average Memory: 38kB Peak Memory: 38kB
Worker 0: Full-sort Groups: 4 Sort Method: quicksort Average Memory: 34kB Peak Memory: 34kB
Worker 1: Full-sort Groups: 4 Sort Method: quicksort Average Memory: 33kB Peak Memory: 33kB
-> Parallel Index Scan using client_partitioned_mycompany_nom_idx1 on client_partitioned_mycompany c1_0 (cost=0.43..395532.62 rows=1352573 width=776) (actual time=0.019..2.580 rows=104 loops=3)
Filter: (company_id = 206852)
-> Index Only Scan using pk_client on client c1_1 (cost=0.43..0.66 rows=1 width=8) (actual time=0.004..0.004 rows=1 loops=20)
Index Cond: (id = c1_0.id)
Heap Fetches: 20
-> Materialize (cost=0.28..8.30 rows=1 width=8) (actual time=0.001..0.001 rows=1 loops=20)
-> Index Only Scan using pk_company on company e1_0 (cost=0.28..8.30 rows=1 width=8) (actual time=0.015..0.016 rows=1 loops=1)
Index Cond: (id = 206852)
Heap Fetches: 1
Planning Time: 0.503 ms
Execution Time: 37.507 ms
有时候我们也会添加过滤器(在name,lastname...,我删除了一些列),但这并不会对结果产生太大的影响。
感谢你的帮助.
1条答案
按热度按时间kkbh8khc1#
你可以做两件简单的事情:
DISTINCT
,这将为您节省保存三秒work_mem
,直到位图堆扫描中不再有lossy
块如果你想保持查询不变,你可以考虑的另一件事是集群大表:
但也许最好和最简单的解决方案是将查询拆分为两个不同的查询:首先从
company
中获取公司的id
,然后查询client
以获取该company_id
。这样,第二个查询就可以准确地知道预期的结果,并可以选择最佳计划。当前查询的问题是PostgreSQL在查询规划时不知道 * 这是否是大company_id
。