目前,我们有以下数据库设计。我们需要支持多个过滤器,但由于规模太大,查询运行时间太长,因此无法支持新的过滤选项。
Table insights
string status
string id (pkey)
string title
string description
string owner
...
Table observation
string id
string source
string resource
string tags
string title
string description
...
Table insight_observation
string insight_id (fkey to id in insight table)
string observation_id (fkey to id in event table)
洞察表与观察表具有m:n关系,并且所有连接都被捕获在洞察_观察表中。
样品过滤器:
- 返回见解状态为
ACTIVE
的所有见解 - 返回观察源位于['AWS','GCP']的所有洞察
- 返回insight_status = 'ACTIVE'且insight_title包含'i-123'且insight_owner =
xyz@testme.com
且observation_source位于['AWS','AZURE']且observation_title包含test alert
的所有洞察。
该查询包括两个内部连接和许多where条件。由于事件表具有超过600万行,查询需要花费大量时间,并且API变得无响应。
任何其他方法或建议都将受到高度赞赏。谢谢!
以下是查询执行计划:
查询执行计划(& E):
db=# explain (ANALYZE, BUFFERS, SETTINGS) select distinct insightent0_.id as id1_3_, insightent0_.end_time as end_time4_3_, insightent0_.last_update_time as last_upd9_3_, insightent0_.org_id as org_id10_3_, insightent0_.severity as severit13_3_, insightent0_.start_time as start_t14_3_ from insight insightent0_
inner join insight_observations observatio1_ on insightent0_.id=observatio1_.insight_id
inner join observation observatio2_ on observatio1_.observation_id=observatio2_.observation_id
where insightent0_.org_id='ff24bwet-a665-9f20-x1m2-d8b083180153' and insightent0_.last_update_time>='2023-03-21T18:04:36.670Z' and insightent0_.last_update_time<='2023-03-01T18:04:36.670Z' and insightent0_.severity=3 and (observatio2_.source in ('AWS' , 'GCP')) order by insightent0_.last_update_time desc;
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Unique (cost=4520.20..4520.25 rows=4 width=204) (actual time=102.001..102.828 rows=0 loops=1)
Buffers: shared hit=1063
-> Sort (cost=4520.20..4520.21 rows=4 width=204) (actual time=101.608..102.435 rows=0 loops=1)
Sort Key: insightent0_.last_update_time DESC, insightent0_.id, insightent0_.end_time, insightent0_.start_time
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=1063
-> Gather (cost=1001.23..4520.16 rows=4 width=204) (actual time=101.591..102.417 rows=0 loops=1)
Workers Planned: 1
Workers Launched: 1
Buffers: shared hit=1063
-> Nested Loop (cost=1.23..3519.76 rows=2 width=204) (actual time=11.017..11.019 rows=0 loops=2)
Buffers: shared hit=1063
-> Nested Loop (cost=0.54..1575.39 rows=268 width=317) (actual time=11.016..11.017 rows=0 loops=2)
Buffers: shared hit=1063
-> Parallel Seq Scan on insight insightent0_ (cost=0.00..1307.96 rows=56 width=204) (actual time=11.015..11.015 rows=0 loops=2)
Filter: ((last_update_time >= '2023-03-21 23:34:36.67+05:30'::timestamp with time zone) AND (last_update_time <= '2023-03-01 23:34:36.67+05:30'::timestamp with time zone) AND ((org_id)::text = 'ee04bfae-a665-4f20-a5b9-d8b043180252'::text) AND (severity = 3))
Rows Removed by Filter: 10411
Buffers: shared hit=1063
-> Index Only Scan using insight_observations_pkey on insight_observations observatio1_ (cost=0.54..4.68 rows=10 width=231) (never executed)
Index Cond: (insight_id = (insightent0_.id)::text)
Heap Fetches: 0
-> Index Scan using observation_pkey on observation observatio2_ (cost=0.68..7.26 rows=1 width=113) (never executed)
Index Cond: ((observation_id)::text = (observatio1_.observation_id)::text)
Filter: ((source)::text = ANY ('{AWS,GCP}'::text[]))
Planning:
Buffers: shared hit=52
Planning Time: 1.370 ms
Execution Time: 102.926 ms
(28 rows)
1条答案
按热度按时间f0brbegy1#
您应该坚持使用规范化设计。问题不在于联接,而在于缺少支持查询的索引。这也会发生在非规范化的数据模型中。
试试这个索引: