我有一个数据库查询,当我添加最后一个通道条件(channel.code)时需要两分钟才能完成,主要的概念问题是我只需要获得每页10个订单,而我在子查询中过滤订单表,但然后我仍然需要执行一些左连接以达到最终条件,此时我将500万个匹配订单与包含数百万条记录的所有表连接。
我想知道是否有可能以某种方式使一个循环使用Postgres,其中,例如,它会采取,例如,100订单(初步筛选的条件表),然后才连接它们与其余的表,并履行条件在年底,然后如果在选定的100订单是合适的,然后他把它们添加到最终结果。这可以做到吗?
所有索引都已存在。下面是一个我想优化的示例查询:
SELECT ord.billing_number,
ord.amount_total_sum,
ord.fare_sum,
ord.service_fee_sum,
ord.service_time_limit,
ord.tax_sum,
ord.currency,
ord.pos,
ord.updated,
ord.created,
ord.user_id,
ord.hidden,
ord.id,
ord.marker,
status.code as status,
status.title as status_title,
partner.code as code,
partner.fee as fee,
partner.fee_refund as fee_refund,
partner.config as config,
partner.project,
COALESCE(contract_parent_partner.code, parent_partner.code) as parent_code,
u.email as user_email
FROM (SELECT * FROM tol.order WHERE billing_number::char <> '4') ord
LEFT JOIN tol.service_cart service_cart ON ord.id = service_cart.order_id
INNER JOIN tol.ticket_avia_v2 ticket
ON service_cart.ticket_uid = ticket.id AND service_cart.service_table_name = 'vw_ticket_avia'
LEFT JOIN tol.user u ON u.id = ord.user_id
LEFT JOIN tol.partner partner ON ord.partner_id = partner.id
LEFT JOIN tol.status status ON ord.status_id = status.id
LEFT JOIN tol.order_channel order_channel ON ord.id = order_channel.order_id
LEFT JOIN tol.partner_contract partner_contract ON ord.partner_contract_id = partner_contract.id
LEFT JOIN tol.partner parent_partner ON parent_partner.id = partner.parent_id
LEFT JOIN tol.channel channel ON order_channel.channel_id = channel.id
LEFT JOIN tol.partner contract_parent_partner
ON contract_parent_partner.id = partner_contract.parent_partner_id
WHERE (EXISTS (SELECT 1
FROM tol.flight
WHERE tol.flight.ticket_uid = ticket.id
AND tol.flight.carrier_code = 'carrier_code'
ORDER BY tol.flight.id))
AND (channel.code = 'channel_code')
ORDER BY ord.id DESC
LIMIT 10;
尝试在连接中使用子查询,但由于我使用左连接,因此它给出的结果与当前查询不同。
这里是解释(分析,缓冲区,设置)
Limit (cost=3226993.73..3226993.74 rows=1 width=515) (actual time=1051.938..1055.143 rows=0 loops=1)
Buffers: shared hit=65 read=2
-> Sort (cost=3226993.73..3226993.74 rows=1 width=515) (actual time=32.786..35.991 rows=0 loops=1)
" Sort Key: ""order"".id DESC"
Sort Method: quicksort Memory: 25kB
Buffers: shared hit=65 read=2
-> Nested Loop Left Join (cost=2038882.89..3226993.72 rows=1 width=515) (actual time=32.724..35.926 rows=0 loops=1)
Join Filter: (contract_parent_partner.id = partner_contract.parent_partner_id)
Buffers: shared hit=62 read=2
-> Nested Loop Left Join (cost=2038882.89..3225588.32 rows=1 width=492) (actual time=32.723..35.924 rows=0 loops=1)
Join Filter: (parent_partner.id = partner.parent_id)
Buffers: shared hit=62 read=2
-> Gather (cost=2038882.89..3224182.92 rows=1 width=491) (actual time=32.722..35.922 rows=0 loops=1)
Workers Planned: 2
Workers Launched: 2
Buffers: shared hit=62 read=2
-> Nested Loop Semi Join (cost=2037882.89..3223182.82 rows=1 width=491) (actual time=0.723..0.728 rows=0 loops=3)
Buffers: shared hit=62 read=2
-> Hash Join (cost=2037812.20..3215337.24 rows=105 width=523) (actual time=0.723..0.726 rows=0 loops=3)
Hash Cond: (order_channel.channel_id = channel.id)
Buffers: shared hit=62 read=2
-> Hash Left Join (cost=2037803.88..3206741.25 rows=3271051 width=527) (never executed)
" Hash Cond: (""order"".partner_contract_id = partner_contract.id)"
-> Hash Left Join (cost=2037462.25..3197812.02 rows=3271051 width=527) (never executed)
" Hash Cond: (""order"".status_id = status.id)"
-> Parallel Hash Left Join (cost=2037460.83..3187308.80 rows=3271051 width=475) (never executed)
" Hash Cond: (""order"".partner_id = partner.id)"
-> Parallel Hash Left Join (cost=1941872.95..3083133.07 rows=3271051 width=166) (never executed)
" Hash Cond: (""order"".user_id = u.id)"
-> Parallel Hash Join (cost=1646731.87..2615342.48 rows=3271051 width=152) (never executed)
" Hash Cond: (service_cart.order_id = ""order"".id)"
-> Parallel Hash Join (cost=582600.12..1337818.87 rows=5341282 width=44) (never executed)
Hash Cond: (service_cart.order_id = order_channel.order_id)
-> Parallel Hash Join (cost=402211.21..1040928.08 rows=5341282 width=36) (never executed)
Hash Cond: (ticket.id = service_cart.ticket_uid)
-> Parallel Index Only Scan using ticket_avia_v2_id on ticket_avia_v2 ticket (cost=0.56..521205.61 rows=6446412 width=16) (never executed)
Heap Fetches: 0
-> Parallel Hash (cost=284286.30..284286.30 rows=6423068 width=20) (never executed)
-> Parallel Seq Scan on service_cart (cost=0.00..284286.30 rows=6423068 width=20) (never executed)
Filter: (service_table_name = 'vw_ticket_avia'::text)
-> Parallel Hash (cost=100494.85..100494.85 rows=4869685 width=8) (never executed)
-> Parallel Seq Scan on order_channel (cost=0.00..100494.85 rows=4869685 width=8) (never executed)
-> Parallel Hash (cost=883642.55..883642.55 rows=6000656 width=116) (never executed)
" -> Parallel Seq Scan on ""order"" (cost=0.00..883642.55 rows=6000656 width=116) (never executed)"
Filter: ((billing_number)::character(1) <> '4'::bpchar)
-> Parallel Hash (cost=221481.48..221481.48 rows=4012048 width=18) (never executed)
" -> Parallel Seq Scan on ""user"" u (cost=0.00..221481.48 rows=4012048 width=18) (never executed)"
-> Parallel Hash (cost=95389.05..95389.05 rows=15906 width=321) (never executed)
-> Parallel Index Scan using partner_parent_index on partner (cost=0.29..95389.05 rows=15906 width=321) (never executed)
-> Hash (cost=1.19..1.19 rows=19 width=60) (never executed)
-> Seq Scan on status (cost=0.00..1.19 rows=19 width=60) (never executed)
-> Hash (cost=243.50..243.50 rows=7850 width=8) (never executed)
-> Seq Scan on partner_contract (cost=0.00..243.50 rows=7850 width=8) (never executed)
-> Hash (cost=8.30..8.30 rows=1 width=4) (actual time=0.484..0.484 rows=0 loops=3)
Buckets: 1024 Batches: 1 Memory Usage: 8kB
Buffers: shared hit=6 read=2
-> Index Scan using uniq_6cae9c8d77153098 on channel (cost=0.29..8.30 rows=1 width=4) (actual time=0.483..0.483 rows=0 loops=3)
Index Cond: (code = 'channel_code'::text)
Buffers: shared hit=6 read=2
-> Bitmap Heap Scan on flight (cost=70.69..74.71 rows=1 width=16) (never executed)
Recheck Cond: ((ticket_uid = service_cart.ticket_uid) AND (carrier_code = 'carrier_code'::text))
-> BitmapAnd (cost=70.69..70.69 rows=1 width=0) (never executed)
-> Bitmap Index Scan on flight_ticket_idx (cost=0.00..4.35 rows=36 width=0) (never executed)
Index Cond: (ticket_uid = service_cart.ticket_uid)
-> Bitmap Index Scan on idx_flight_carrier_code (cost=0.00..65.28 rows=3295 width=0) (never executed)
Index Cond: (carrier_code = 'carrier_code'::text)
-> Seq Scan on partner parent_partner (cost=0.00..1067.40 rows=27040 width=9) (never executed)
-> Seq Scan on partner contract_parent_partner (cost=0.00..1067.40 rows=27040 width=9) (never executed)
Planning Time: 48.168 ms
JIT:
Functions: 228
" Options: Inlining true, Optimization true, Expressions true, Deforming true"
" Timing: Generation 23.199 ms, Inlining 51.477 ms, Optimization 580.442 ms, Emission 386.265 ms, Total 1041.384 ms"
Execution Time: 1078.301 ms
1条答案
按热度按时间ryevplcw1#
What you ask for
当然是有可能的构建一个函数来循环排序行的补丁,或值的范围,以强制执行某个“查询计划”,其中Postgres有一个盲点。一个复杂的例子:
通常情况下,更好的解决方案是直接了解服务器配置、数据模型和列统计信息。如果一个特定的索引驱动查询是最快的方法,那么Postgres * 应该 * 自行到达该查询计划。
……但这对于给定的案例来说毫无意义。您添加的查询计划显示
rows=0
。未找到符合条件的行。对于ORDER BY
和小LIMIT
的查询,这通常是最坏的情况-如果查询计划最终沿着排序顺序遍历,直到找到足够的行。你强迫Postgres在100行的小补丁中这样做的想法只会让情况变得更糟。服务器配置
您的查询计划不会显示任何非标准设置(使用
SETTINGS
选项调用EXPLAIN
时会显示非标准设置)。如果你的输入是可靠的,这意味着你正在使用默认设置运行Postgres,这只适用于非常基本的需求。从服务器配置开始。以下是一些入门:常规改进
您要连接11个表(不清楚是否需要涉及所有这些表)。这超出了
join_collapse_limit
的默认设置。因此,所写的连接顺序变得相关。参见:将实际筛选行的联接移到前面,将使行相乘的联接移到末尾。这可能已经大大提高了性能:
此外,
LEFT JOIN
的一些示例实际上是[INNER] JOIN
。参见:这通常并不重要,Postgres无论如何都是正确的。但是对于手动优化连接的顺序来说,这可能很重要。参见:
假设
order.billing_number
是字符串类型,用ord.billing_number ^@ '4'
替换billing_number::char <> '4') ord
,这通常要便宜得多,并且允许使用COLLATE "C"
的普通索引。参见:
如果是不同的数据类型,用不同的方式优化。。
ORDER BY
在EXISTS
子查询表达式中没有意义。我掉了。(At最少)到
contract_parent_partner
的连接看起来可能会增加行。是这样吗?在这种情况下,是否要为每个符合条件的订单返回多行?如果所有这些仍然不能让您获得可接受的性能,那么,是的,某种递归或循环可能有助于(很大程度上)填充您的小
LIMIT 10
-用于实际查找行的查询。或者,您可以优化一些索引并更新/改进统计信息,以获得不同的查询计划。这超出了SO上的一个问题的范围。