我有两个具有相同分区和子分区的表。它们都是按group_id
(text
)分区的,并使用created_at
(timestamptz
)按年份分区。为了使问题更容易理解,我在这里做了更多简化的表:
create table if not exists items (
group_id text,
item_id integer,
created_at timestamp with time zone,
primary key (group_id, created_at, item_id)
) partition by list (group_id);
-- Partition by group ID.
create table items_a partition of items for values in ('a') partition by range (created_at);
create table items_b partition of items for values in ('b') partition by range (created_at);
-- Partition by year.
create table items_a2022 partition of items_a for values from ('2022-01-01') to ('2023-01-01');
create table items_a2023 partition of items_a for values from ('2023-01-01') to ('2024-01-01');
create table items_b2022 partition of items_b for values from ('2022-01-01') to ('2023-01-01');
create table items_b2023 partition of items_b for values from ('2023-01-01') to ('2024-01-01');
create table if not exists things (
group_id text,
item_id integer,
item_created_at timestamp with time zone,
FOREIGN KEY (group_id, item_created_at, item_id) REFERENCES items (group_id, created_at, item_id)
) partition by list(group_id);
-- Partition by group ID.
create table things_a partition of things for values in ('a') partition by range (item_created_at);
create table things_b partition of things for values in ('b') partition by range (item_created_at);
-- Partition by year.
create table things_a2022 partition of things_a for values from ('2022-01-01') to ('2023-01-01');
create table things_a2023 partition of things_a for values from ('2023-01-01') to ('2024-01-01');
create table things_b2022 partition of things_b for values from ('2022-01-01') to ('2023-01-01');
create table things_b2023 partition of things_b for values from ('2023-01-01') to ('2024-01-01');
字符串
几行帮助测试:
insert into items (group_id, item_id, created_at) values ('a', 1, '2022-01-01');
insert into items (group_id, item_id, created_at) values ('b', 2, '2023-06-10');
insert into things (group_id, item_id, item_created_at) values ('a', 1, '2022-01-01');
型
启用分区连接:
set enable_partitionwise_join = on;
型
查询items
一年,组ID按预期修剪,其中只查询四个可能的items
表之一(items_a2022
):
explain
select count(*)
from items
where items.created_at >= '2022-05-01'::timestamptz
and items.created_at <= '2022-06-01'::timestamptz
and items.group_id = 'a';
Aggregate (cost=8.18..8.19 rows=1 width=8) (actual time=0.039..0.039 rows=1 loops=1)
-> Index Only Scan using items_a2022_pkey on items_a2022 items (cost=0.15..8.17 rows=1 width=0) (actual time=0.036..0.036 rows=0 loops=1)
Index Cond: ((group_id = 'a'::text) AND (created_at >= '2022-05-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2022-06-01 00:00:00+00'::timestamp with time zone))
Heap Fetches: 0
的字符串
当我加入things
时:
explain
select count(*)
from items
join things on things.item_id = items.item_id
and things.item_created_at = items.created_at
and things.group_id = items.group_id
where items.created_at >= '2022-05-01'::timestamptz
and items.created_at <= '2022-06-01'::timestamptz
and items.group_id = 'a';
Aggregate (cost=56.67..56.68 rows=1 width=8)
-> Nested Loop (cost=0.15..56.67 rows=1 width=0)
Join Filter: ((items.item_id = things.item_id) AND (items.created_at = things.item_created_at))
-> Index Only Scan using items_a2022_pkey on items_a2022 items (cost=0.15..8.17 rows=1 width=44)
Index Cond: ((group_id = 'a'::text) AND (created_at >= '2022-05-01 00:00:00+00'::timestamp with time zone) AND (created_at <= '2022-06-01 00:00:00+00'::timestamp with time zone))
-> Append (cost=0.00..48.31 rows=12 width=44)
-> Seq Scan on things_a2022 things_1 (cost=0.00..24.12 rows=6 width=44)
Filter: (group_id = 'a'::text)
-> Seq Scan on things_a2023 things_2 (cost=0.00..24.12 rows=6 width=44)
Filter: (group_id = 'a'::text)
我预计enable_partitionwise_join
是true
会导致things
被修剪,但它只对组ID进行修剪,而不是年份。因此,things_a2022
和things_a2023
都被使用。我想知道分区连接不支持子分区,但似乎不是这样的情况,考虑到如果我更改为=
而不是<=
,它确实按预期工作:
explain
select count(*)
from items
join things on things.item_id = items.item_id
and things.item_created_at = items.created_at
and things.group_id = items.group_id
where items.created_at = '2022-05-01'::timestamptz
and items.group_id = 'a';
Aggregate (cost=35.14..35.15 rows=1 width=8)
-> Nested Loop (cost=0.15..35.14 rows=1 width=0)
Join Filter: (items.item_id = things.item_id)
-> Index Only Scan using items_a2022_pkey on items_a2022 items (cost=0.15..8.17 rows=1 width=44)
Index Cond: ((group_id = 'a'::text) AND (created_at = '2022-05-01 00:00:00+00'::timestamp with time zone))
-> Seq Scan on things_a2022 things (cost=0.00..26.95 rows=1 width=44)
Filter: ((item_created_at = '2022-05-01 00:00:00+00'::timestamp with time zone) AND (group_id = 'a'::text))
我们已经测试了Postgres 13(我们的生产版本)和版本15。有人知道为什么分区连接在这种情况下只部分应用吗?谢谢。
1条答案
按热度按时间t3psigkw1#
首先,对于一个有用的测试来说,这些行太少了,但是即使有足够的行,你也不会在
things
上得到分区修剪,因为PostgreSQL的智能不足以做出推断。PostgreSQL可以对子分区执行分区连接。如果你省略
WHERE
条件并插入足够的行,你可以验证这一点。