postgresql 是否可以使用postgre递归减少左连接中的行数?

tjjdgumg  于 2023-05-17  发布在  PostgreSQL
关注(0)|答案(1)|浏览(230)

我有一个数据库查询,当我添加最后一个通道条件(channel.code)时需要两分钟才能完成,主要的概念问题是我只需要获得每页10个订单,而我在子查询中过滤订单表,但然后我仍然需要执行一些左连接以达到最终条件,此时我将500万个匹配订单与包含数百万条记录的所有表连接。
我想知道是否有可能以某种方式使一个循环使用Postgres,其中,例如,它会采取,例如,100订单(初步筛选的条件表),然后才连接它们与其余的表,并履行条件在年底,然后如果在选定的100订单是合适的,然后他把它们添加到最终结果。这可以做到吗?
所有索引都已存在。下面是一个我想优化的示例查询:

SELECT ord.billing_number,
       ord.amount_total_sum,
       ord.fare_sum,
       ord.service_fee_sum,
       ord.service_time_limit,
       ord.tax_sum,
       ord.currency,
       ord.pos,
       ord.updated,
       ord.created,
       ord.user_id,
       ord.hidden,
       ord.id,
       ord.marker,
       status.code                                                 as status,
       status.title                                                as status_title,
       partner.code                                                as code,
       partner.fee                                                 as fee,
       partner.fee_refund                                          as fee_refund,
       partner.config                                              as config,
       partner.project,
       COALESCE(contract_parent_partner.code, parent_partner.code) as parent_code,
       u.email                                                     as user_email
FROM (SELECT * FROM tol.order WHERE billing_number::char <> '4') ord
 LEFT JOIN tol.service_cart service_cart ON ord.id = service_cart.order_id
 INNER JOIN tol.ticket_avia_v2 ticket
            ON service_cart.ticket_uid = ticket.id AND service_cart.service_table_name = 'vw_ticket_avia'
 LEFT JOIN tol.user u ON u.id = ord.user_id
 LEFT JOIN tol.partner partner ON ord.partner_id = partner.id
 LEFT JOIN tol.status status ON ord.status_id = status.id
 LEFT JOIN tol.order_channel order_channel ON ord.id = order_channel.order_id
 LEFT JOIN tol.partner_contract partner_contract ON ord.partner_contract_id = partner_contract.id
 LEFT JOIN tol.partner parent_partner ON parent_partner.id = partner.parent_id
 LEFT JOIN tol.channel channel ON order_channel.channel_id = channel.id
 LEFT JOIN tol.partner contract_parent_partner
           ON contract_parent_partner.id = partner_contract.parent_partner_id
WHERE (EXISTS (SELECT 1
               FROM tol.flight
               WHERE tol.flight.ticket_uid = ticket.id
                 AND tol.flight.carrier_code = 'carrier_code'
               ORDER BY tol.flight.id))
  AND (channel.code = 'channel_code')
ORDER BY ord.id DESC
LIMIT 10;

尝试在连接中使用子查询,但由于我使用左连接,因此它给出的结果与当前查询不同。
这里是解释(分析,缓冲区,设置)

Limit  (cost=3226993.73..3226993.74 rows=1 width=515) (actual time=1051.938..1055.143 rows=0 loops=1)
  Buffers: shared hit=65 read=2
  ->  Sort  (cost=3226993.73..3226993.74 rows=1 width=515) (actual time=32.786..35.991 rows=0 loops=1)
"        Sort Key: ""order"".id DESC"
        Sort Method: quicksort  Memory: 25kB
        Buffers: shared hit=65 read=2
        ->  Nested Loop Left Join  (cost=2038882.89..3226993.72 rows=1 width=515) (actual time=32.724..35.926 rows=0 loops=1)
              Join Filter: (contract_parent_partner.id = partner_contract.parent_partner_id)
              Buffers: shared hit=62 read=2
              ->  Nested Loop Left Join  (cost=2038882.89..3225588.32 rows=1 width=492) (actual time=32.723..35.924 rows=0 loops=1)
                    Join Filter: (parent_partner.id = partner.parent_id)
                    Buffers: shared hit=62 read=2
                    ->  Gather  (cost=2038882.89..3224182.92 rows=1 width=491) (actual time=32.722..35.922 rows=0 loops=1)
                          Workers Planned: 2
                          Workers Launched: 2
                          Buffers: shared hit=62 read=2
                          ->  Nested Loop Semi Join  (cost=2037882.89..3223182.82 rows=1 width=491) (actual time=0.723..0.728 rows=0 loops=3)
                                Buffers: shared hit=62 read=2
                                ->  Hash Join  (cost=2037812.20..3215337.24 rows=105 width=523) (actual time=0.723..0.726 rows=0 loops=3)
                                      Hash Cond: (order_channel.channel_id = channel.id)
                                      Buffers: shared hit=62 read=2
                                      ->  Hash Left Join  (cost=2037803.88..3206741.25 rows=3271051 width=527) (never executed)
"                                            Hash Cond: (""order"".partner_contract_id = partner_contract.id)"
                                            ->  Hash Left Join  (cost=2037462.25..3197812.02 rows=3271051 width=527) (never executed)
"                                                  Hash Cond: (""order"".status_id = status.id)"
                                                  ->  Parallel Hash Left Join  (cost=2037460.83..3187308.80 rows=3271051 width=475) (never executed)
"                                                        Hash Cond: (""order"".partner_id = partner.id)"
                                                        ->  Parallel Hash Left Join  (cost=1941872.95..3083133.07 rows=3271051 width=166) (never executed)
"                                                              Hash Cond: (""order"".user_id = u.id)"
                                                              ->  Parallel Hash Join  (cost=1646731.87..2615342.48 rows=3271051 width=152) (never executed)
"                                                                    Hash Cond: (service_cart.order_id = ""order"".id)"
                                                                    ->  Parallel Hash Join  (cost=582600.12..1337818.87 rows=5341282 width=44) (never executed)
                                                                          Hash Cond: (service_cart.order_id = order_channel.order_id)
                                                                          ->  Parallel Hash Join  (cost=402211.21..1040928.08 rows=5341282 width=36) (never executed)
                                                                                Hash Cond: (ticket.id = service_cart.ticket_uid)
                                                                                ->  Parallel Index Only Scan using ticket_avia_v2_id on ticket_avia_v2 ticket  (cost=0.56..521205.61 rows=6446412 width=16) (never executed)
                                                                                      Heap Fetches: 0
                                                                                ->  Parallel Hash  (cost=284286.30..284286.30 rows=6423068 width=20) (never executed)
                                                                                      ->  Parallel Seq Scan on service_cart  (cost=0.00..284286.30 rows=6423068 width=20) (never executed)
                                                                                            Filter: (service_table_name = 'vw_ticket_avia'::text)
                                                                          ->  Parallel Hash  (cost=100494.85..100494.85 rows=4869685 width=8) (never executed)
                                                                                ->  Parallel Seq Scan on order_channel  (cost=0.00..100494.85 rows=4869685 width=8) (never executed)
                                                                    ->  Parallel Hash  (cost=883642.55..883642.55 rows=6000656 width=116) (never executed)
"                                                                          ->  Parallel Seq Scan on ""order""  (cost=0.00..883642.55 rows=6000656 width=116) (never executed)"
                                                                                Filter: ((billing_number)::character(1) <> '4'::bpchar)
                                                              ->  Parallel Hash  (cost=221481.48..221481.48 rows=4012048 width=18) (never executed)
"                                                                    ->  Parallel Seq Scan on ""user"" u  (cost=0.00..221481.48 rows=4012048 width=18) (never executed)"
                                                        ->  Parallel Hash  (cost=95389.05..95389.05 rows=15906 width=321) (never executed)
                                                              ->  Parallel Index Scan using partner_parent_index on partner  (cost=0.29..95389.05 rows=15906 width=321) (never executed)
                                                  ->  Hash  (cost=1.19..1.19 rows=19 width=60) (never executed)
                                                        ->  Seq Scan on status  (cost=0.00..1.19 rows=19 width=60) (never executed)
                                            ->  Hash  (cost=243.50..243.50 rows=7850 width=8) (never executed)
                                                  ->  Seq Scan on partner_contract  (cost=0.00..243.50 rows=7850 width=8) (never executed)
                                      ->  Hash  (cost=8.30..8.30 rows=1 width=4) (actual time=0.484..0.484 rows=0 loops=3)
                                            Buckets: 1024  Batches: 1  Memory Usage: 8kB
                                            Buffers: shared hit=6 read=2
                                            ->  Index Scan using uniq_6cae9c8d77153098 on channel  (cost=0.29..8.30 rows=1 width=4) (actual time=0.483..0.483 rows=0 loops=3)
                                                  Index Cond: (code = 'channel_code'::text)
                                                  Buffers: shared hit=6 read=2
                                ->  Bitmap Heap Scan on flight  (cost=70.69..74.71 rows=1 width=16) (never executed)
                                      Recheck Cond: ((ticket_uid = service_cart.ticket_uid) AND (carrier_code = 'carrier_code'::text))
                                      ->  BitmapAnd  (cost=70.69..70.69 rows=1 width=0) (never executed)
                                            ->  Bitmap Index Scan on flight_ticket_idx  (cost=0.00..4.35 rows=36 width=0) (never executed)
                                                  Index Cond: (ticket_uid = service_cart.ticket_uid)
                                            ->  Bitmap Index Scan on idx_flight_carrier_code  (cost=0.00..65.28 rows=3295 width=0) (never executed)
                                                  Index Cond: (carrier_code = 'carrier_code'::text)
                    ->  Seq Scan on partner parent_partner  (cost=0.00..1067.40 rows=27040 width=9) (never executed)
              ->  Seq Scan on partner contract_parent_partner  (cost=0.00..1067.40 rows=27040 width=9) (never executed)
Planning Time: 48.168 ms
JIT:
  Functions: 228
"  Options: Inlining true, Optimization true, Expressions true, Deforming true"
"  Timing: Generation 23.199 ms, Inlining 51.477 ms, Optimization 580.442 ms, Emission 386.265 ms, Total 1041.384 ms"
Execution Time: 1078.301 ms
ryevplcw

ryevplcw1#

What you ask for

当然是有可能的构建一个函数来循环排序行的补丁,或值的范围,以强制执行某个“查询计划”,其中Postgres有一个盲点。一个复杂的例子:

通常情况下,更好的解决方案是直接了解服务器配置、数据模型和列统计信息。如果一个特定的索引驱动查询是最快的方法,那么Postgres * 应该 * 自行到达该查询计划。
……但这对于给定的案例来说毫无意义。您添加的查询计划显示rows=0。未找到符合条件的行。对于ORDER BY和小LIMIT的查询,这通常是最坏的情况-如果查询计划最终沿着排序顺序遍历,直到找到足够的行。你强迫Postgres在100行的小补丁中这样做的想法只会让情况变得更糟。

服务器配置

您的查询计划不会显示任何非标准设置(使用SETTINGS选项调用EXPLAIN时会显示非标准设置)。如果你的输入是可靠的,这意味着你正在使用默认设置运行Postgres,这只适用于非常基本的需求。从服务器配置开始。以下是一些入门:

常规改进

您要连接11个表(不清楚是否需要涉及所有这些表)。这超出了join_collapse_limit的默认设置。因此,所写的连接顺序变得相关。参见:

  • 为什么搜索词的微小变化会使查询速度变慢这么多?

将实际筛选行的联接移到前面,将使行相乘的联接移到末尾。这可能已经大大提高了性能:

SELECT ord.billing_number
  --  , ... (a lot of fields)
FROM   tol.order              ord
JOIN   tol.order_channel               ON order_channel.order_id = ord.id
JOIN   tol.channel                     ON channel.id = order_channel.channel_id  -- move to the top !!
JOIN   tol.service_cart                ON service_cart.order_id = ord.id
JOIN   tol.ticket_avia_v2     ticket   ON ticket.id = service_cart.ticket_uid
                                      AND service_cart.service_table_name = 'vw_ticket_avia'
LEFT   JOIN tol.user          u        ON u.id = ord.user_id
LEFT   JOIN tol.status                 ON status.id = ord.status_id
LEFT   JOIN tol.partner                ON partner.id = ord.partner_id
LEFT   JOIN tol.partner       parent_partner ON parent_partner.id = partner.parent_id
LEFT   JOIN tol.partner_contract       ON partner_contract.id = ord.partner_contract_id
LEFT   JOIN tol.partner       contract_parent_partner ON contract_parent_partner.id = partner_contract.parent_partner_id
WHERE  ord.billing_number ^@ '4'  -- "starts with" operator - more efficient
AND    EXISTS (
   SELECT FROM tol.flight f
   WHERE  f.ticket_uid = ticket.id
   AND    f.carrier_code = 'carrier_code'
   )
AND    channel.code = 'channel_code'
ORDER  BY ord.id DESC
LIMIT  10;

此外,LEFT JOIN的一些示例实际上是[INNER] JOIN。参见:

  • 带WHERE条件的Postgres LEFT JOIN

这通常并不重要,Postgres无论如何都是正确的。但是对于手动优化连接的顺序来说,这可能很重要。参见:
假设order.billing_number是字符串类型,用ord.billing_number ^@ '4'替换billing_number::char <> '4') ord,这通常要便宜得多,并且允许使用COLLATE "C"的普通索引。

CREATE INDEX text_b ON tol.order (billing_number COLLATE "C");

参见:

  • PostgreSQL LIKE查询性能变化

如果是不同的数据类型,用不同的方式优化。。
ORDER BYEXISTS子查询表达式中没有意义。我掉了。
(At最少)到contract_parent_partner的连接看起来可能会增加行。是这样吗?在这种情况下,是否要为每个符合条件的订单返回多行?
如果所有这些仍然不能让您获得可接受的性能,那么,是的,某种递归或循环可能有助于(很大程度上)填充您的小LIMIT 10-用于实际查找行的查询。或者,您可以优化一些索引并更新/改进统计信息,以获得不同的查询计划。这超出了SO上的一个问题的范围。

相关问题