postgresql 分区联接不适用于范围筛选器

dxxyhpgq 于 2023-11-18 发布在 PostgreSQL

关注(0)|答案(1)|浏览(138)

我有两个具有相同分区和子分区的表。它们都是按group_id（text）分区的，并使用created_at（timestamptz）按年份分区。为了使问题更容易理解，我在这里做了更多简化的表：

create table if not exists items (
    group_id text,
    item_id integer,
    created_at timestamp with time zone,
    primary key (group_id, created_at, item_id)
) partition by list (group_id);

-- Partition by group ID.
create table items_a partition of items for values in ('a') partition by range (created_at);
create table items_b partition of items for values in ('b') partition by range (created_at);

-- Partition by year.
create table items_a2022 partition of items_a for values from ('2022-01-01') to ('2023-01-01');
create table items_a2023 partition of items_a for values from ('2023-01-01') to ('2024-01-01');
create table items_b2022 partition of items_b for values from ('2022-01-01') to ('2023-01-01');
create table items_b2023 partition of items_b for values from ('2023-01-01') to ('2024-01-01');

create table if not exists things (
    group_id text,
    item_id integer,
    item_created_at timestamp with time zone,
    FOREIGN KEY (group_id, item_created_at, item_id) REFERENCES items (group_id, created_at, item_id)
) partition by list(group_id);

-- Partition by group ID.
create table things_a partition of things for values in ('a') partition by range (item_created_at);
create table things_b partition of things for values in ('b') partition by range (item_created_at);

-- Partition by year.
create table things_a2022 partition of things_a for values from ('2022-01-01') to ('2023-01-01');
create table things_a2023 partition of things_a for values from ('2023-01-01') to ('2024-01-01');
create table things_b2022 partition of things_b for values from ('2022-01-01') to ('2023-01-01');
create table things_b2023 partition of things_b for values from ('2023-01-01') to ('2024-01-01');

字符串
几行帮助测试：

insert into items (group_id, item_id, created_at) values ('a', 1, '2022-01-01');
insert into items (group_id, item_id, created_at) values ('b', 2, '2023-06-10');
insert into things (group_id, item_id, item_created_at) values ('a', 1, '2022-01-01');

型
启用分区连接：

set enable_partitionwise_join = on;

型
查询items一年，组ID按预期修剪，其中只查询四个可能的items表之一（items_a2022）：

explain
select count(*)
from items
where items.created_at >= '2022-05-01'::timestamptz
    and items.created_at <= '2022-06-01'::timestamptz
    and items.group_id = 'a';

Aggregate  (cost=8.18..8.19 rows=1 width=8) (actual time=0.039..0.039 rows=1 loops=1)
  ->  Index Only Scan using items_a2022_pkey on items_a2022 items  (cost=0.15..8.17 rows=1 width=0) (actual time=0.036..0.036 rows=0 loops=1)
        Index Cond: ((group_id = 'a'::text) AND (created_at >= '2022-05-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2022-06-01 00:00:00+00'::timestamp with time zone))
        Heap Fetches: 0

的字符串
当我加入things时：

explain
select count(*)
from items
join things on things.item_id = items.item_id
    and things.item_created_at = items.created_at
    and things.group_id = items.group_id
where items.created_at >= '2022-05-01'::timestamptz
    and items.created_at <= '2022-06-01'::timestamptz
    and items.group_id = 'a';

Aggregate  (cost=56.67..56.68 rows=1 width=8)
  ->  Nested Loop  (cost=0.15..56.67 rows=1 width=0)
        Join Filter: ((items.item_id = things.item_id) AND (items.created_at = things.item_created_at))
        ->  Index Only Scan using items_a2022_pkey on items_a2022 items  (cost=0.15..8.17 rows=1 width=44)
              Index Cond: ((group_id = 'a'::text) AND (created_at >= '2022-05-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2022-06-01 00:00:00+00'::timestamp with time zone))
        ->  Append  (cost=0.00..48.31 rows=12 width=44)
              ->  Seq Scan on things_a2022 things_1  (cost=0.00..24.12 rows=6 width=44)
                    Filter: (group_id = 'a'::text)
              ->  Seq Scan on things_a2023 things_2  (cost=0.00..24.12 rows=6 width=44)
                    Filter: (group_id = 'a'::text)

我预计enable_partitionwise_join是true会导致things被修剪，但它只对组ID进行修剪，而不是年份。因此，things_a2022和things_a2023都被使用。我想知道分区连接不支持子分区，但似乎不是这样的情况，考虑到如果我更改为=而不是<=，它确实按预期工作：

explain
select count(*)
from items
join things on things.item_id = items.item_id
    and things.item_created_at = items.created_at
    and things.group_id = items.group_id
where items.created_at = '2022-05-01'::timestamptz
    and items.group_id = 'a';

Aggregate  (cost=35.14..35.15 rows=1 width=8)
  ->  Nested Loop  (cost=0.15..35.14 rows=1 width=0)
        Join Filter: (items.item_id = things.item_id)
        ->  Index Only Scan using items_a2022_pkey on items_a2022 items  (cost=0.15..8.17 rows=1 width=44)
              Index Cond: ((group_id = 'a'::text) AND (created_at = '2022-05-01 00:00:00+00'::timestamp with time zone))
        ->  Seq Scan on things_a2022 things  (cost=0.00..26.95 rows=1 width=44)
              Filter: ((item_created_at = '2022-05-01 00:00:00+00'::timestamp with time zone) AND (group_id = 'a'::text))

我们已经测试了Postgres 13（我们的生产版本）和版本15。有人知道为什么分区连接在这种情况下只部分应用吗？谢谢。

postgresql

来源：https://stackoverflow.com/questions/77447509/partitionwise-join-not-working-for-range-filter