postgresql 分区联接不适用于范围筛选器

dxxyhpgq  于 2023-11-18  发布在  PostgreSQL
关注(0)|答案(1)|浏览(138)

我有两个具有相同分区和子分区的表。它们都是按group_idtext)分区的,并使用created_attimestamptz)按年份分区。为了使问题更容易理解,我在这里做了更多简化的表:

create table if not exists items (
    group_id text,
    item_id integer,
    created_at timestamp with time zone,
    primary key (group_id, created_at, item_id)
) partition by list (group_id);

-- Partition by group ID.
create table items_a partition of items for values in ('a') partition by range (created_at);
create table items_b partition of items for values in ('b') partition by range (created_at);

-- Partition by year.
create table items_a2022 partition of items_a for values from ('2022-01-01') to ('2023-01-01');
create table items_a2023 partition of items_a for values from ('2023-01-01') to ('2024-01-01');
create table items_b2022 partition of items_b for values from ('2022-01-01') to ('2023-01-01');
create table items_b2023 partition of items_b for values from ('2023-01-01') to ('2024-01-01');

create table if not exists things (
    group_id text,
    item_id integer,
    item_created_at timestamp with time zone,
    FOREIGN KEY (group_id, item_created_at, item_id) REFERENCES items (group_id, created_at, item_id)
) partition by list(group_id);

-- Partition by group ID.
create table things_a partition of things for values in ('a') partition by range (item_created_at);
create table things_b partition of things for values in ('b') partition by range (item_created_at);

-- Partition by year.
create table things_a2022 partition of things_a for values from ('2022-01-01') to ('2023-01-01');
create table things_a2023 partition of things_a for values from ('2023-01-01') to ('2024-01-01');
create table things_b2022 partition of things_b for values from ('2022-01-01') to ('2023-01-01');
create table things_b2023 partition of things_b for values from ('2023-01-01') to ('2024-01-01');

字符串
几行帮助测试:

insert into items (group_id, item_id, created_at) values ('a', 1, '2022-01-01');
insert into items (group_id, item_id, created_at) values ('b', 2, '2023-06-10');
insert into things (group_id, item_id, item_created_at) values ('a', 1, '2022-01-01');


启用分区连接:

set enable_partitionwise_join = on;


查询items一年,组ID按预期修剪,其中只查询四个可能的items表之一(items_a2022):

explain
select count(*)
from items
where items.created_at >= '2022-05-01'::timestamptz
    and items.created_at <= '2022-06-01'::timestamptz
    and items.group_id = 'a';
Aggregate  (cost=8.18..8.19 rows=1 width=8) (actual time=0.039..0.039 rows=1 loops=1)
  ->  Index Only Scan using items_a2022_pkey on items_a2022 items  (cost=0.15..8.17 rows=1 width=0) (actual time=0.036..0.036 rows=0 loops=1)
        Index Cond: ((group_id = 'a'::text) AND (created_at >= '2022-05-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2022-06-01 00:00:00+00'::timestamp with time zone))
        Heap Fetches: 0

的字符串
当我加入things时:

explain
select count(*)
from items
join things on things.item_id = items.item_id
    and things.item_created_at = items.created_at
    and things.group_id = items.group_id
where items.created_at >= '2022-05-01'::timestamptz
    and items.created_at <= '2022-06-01'::timestamptz
    and items.group_id = 'a';
Aggregate  (cost=56.67..56.68 rows=1 width=8)
  ->  Nested Loop  (cost=0.15..56.67 rows=1 width=0)
        Join Filter: ((items.item_id = things.item_id) AND (items.created_at = things.item_created_at))
        ->  Index Only Scan using items_a2022_pkey on items_a2022 items  (cost=0.15..8.17 rows=1 width=44)
              Index Cond: ((group_id = 'a'::text) AND (created_at >= '2022-05-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2022-06-01 00:00:00+00'::timestamp with time zone))
        ->  Append  (cost=0.00..48.31 rows=12 width=44)
              ->  Seq Scan on things_a2022 things_1  (cost=0.00..24.12 rows=6 width=44)
                    Filter: (group_id = 'a'::text)
              ->  Seq Scan on things_a2023 things_2  (cost=0.00..24.12 rows=6 width=44)
                    Filter: (group_id = 'a'::text)

我预计enable_partitionwise_jointrue会导致things被修剪,但它只对组ID进行修剪,而不是年份。因此,things_a2022things_a2023都被使用。我想知道分区连接不支持子分区,但似乎不是这样的情况,考虑到如果我更改为=而不是<=,它确实按预期工作:

explain
select count(*)
from items
join things on things.item_id = items.item_id
    and things.item_created_at = items.created_at
    and things.group_id = items.group_id
where items.created_at = '2022-05-01'::timestamptz
    and items.group_id = 'a';
Aggregate  (cost=35.14..35.15 rows=1 width=8)
  ->  Nested Loop  (cost=0.15..35.14 rows=1 width=0)
        Join Filter: (items.item_id = things.item_id)
        ->  Index Only Scan using items_a2022_pkey on items_a2022 items  (cost=0.15..8.17 rows=1 width=44)
              Index Cond: ((group_id = 'a'::text) AND (created_at = '2022-05-01 00:00:00+00'::timestamp with time zone))
        ->  Seq Scan on things_a2022 things  (cost=0.00..26.95 rows=1 width=44)
              Filter: ((item_created_at = '2022-05-01 00:00:00+00'::timestamp with time zone) AND (group_id = 'a'::text))

我们已经测试了Postgres 13(我们的生产版本)和版本15。有人知道为什么分区连接在这种情况下只部分应用吗?谢谢。

t3psigkw

t3psigkw1#

首先,对于一个有用的测试来说,这些行太少了,但是即使有足够的行,你也不会在things上得到分区修剪,因为PostgreSQL的智能不足以做出推断。
PostgreSQL可以对子分区执行分区连接。如果你省略WHERE条件并插入足够的行,你可以验证这一点。

相关问题