如何通过窗口函数barier为PostgreSQL数据库?

lf3rwulv  于 2024-01-07  发布在  PostgreSQL
关注(0)|答案(1)|浏览(150)

我使用PostgreSQL v15.4。我有下一个查询:

explain (analyze, costs, verbose, buffers) 
with vimages as (
    SELECT
        vim.view_id,
        array_to_json(array_agg(row_to_json( row(            
            vim.created_at,
            igim.id, 
            igim.name, 
            vim.sequence_no
        )) order by vim.sequence_no desc)) as json
    FROM view_images vim
    JOIN xx_images igim ON vim.xx_image_id = igim.id
    GROUP BY vim.view_id
)

select vimages.*
from xxs ig
left join vimages on vimages.view_id = ig.id
WHERE ig.alias = '257_belmont_cir_brunswick_ga'

字符串
这是此查询的计划。执行此查询需要27秒。因此只有一行。
如果我硬编码结果行的id,例如and ig.id = 682430783638437250,那么查询需要only 6ms
有没有可能以这种方式重新格式化我的查询,以便当我通过alias列查询ig表时,vim表立即被过滤,然后聚合?就像我硬编码ID值时发生的那样。

w6mmgewl

w6mmgewl1#

示例数据集(小10倍):

CREATE UNLOGGED TABLE views( view_id INTEGER NOT NULL, alias INTEGER NOT NULL );
INSERT INTO views SELECT n,n/3 FROM generate_series(1,10000) n;
ALTER TABLE views ADD PRIMARY KEY (view_id);
CREATE INDEX ON views(alias);

CREATE UNLOGGED TABLE images( image_id INTEGER NOT NULL, image_foo INT NOT NULL );
INSERT INTO images SELECT n,n FROM generate_series(1,400000) n;
ALTER TABLE images ADD PRIMARY KEY (image_id);

CREATE UNLOGGED TABLE view_images( view_id INTEGER NOT NULL, image_id INTEGER NOT NULL, vim_foo INT NOT NULL );
INSERT INTO view_images SELECT (1+random()*10000)::INTEGER view_id, (1+random()*400000)::INTEGER image_id, n FROM generate_series(1,1500000) n;
CREATE INDEX ON view_images( view_id );
CREATE INDEX ON view_images( image_id );

VACUUM ANALYZE;

EXPLAIN ANALYZE 
with vimages as (
    SELECT
        vim.view_id,
        array_to_json(array_agg(row_to_json( row(            
            vim.vim_foo,
            im.image_id, 
            im.image_foo
        )) order by vim.vim_foo desc)) as json
    FROM view_images vim
    JOIN images im USING (image_id)
    GROUP BY vim.view_id
)

select vimages.*
from views 
left join vimages USING (view_id)
WHERE views.alias = 1234;

字符串
它精确地再现了您的慢速查询计划。
将CTE移至查询:无变更。
常见问题(下):计划不再涉及排序,因为我在聚合中删除了ORDER BY。我没有粘贴计划,问题仍然存在,它仍然在images和images_views上进行seq扫描和散列连接:

EXPLAIN SELECT * FROM views LEFT JOIN (
    SELECT vim.view_id, array_agg( im.image_id ) as agg
    FROM view_images vim
    JOIN images im USING (image_id)
    GROUP BY vim.view_id
) vimages USING (view_id)
WHERE views.alias = 1234;


在我的示例数据中,alias=1234对应于WHERE view_id IN(3702,3703,3704)。如果我把它放在查询的末尾,没有变化。如果我把它放在子查询中,我得到快速计划。
因此,问题似乎是它没有在子查询内部传播view_id上的连接条件。
解决方案#1:移动GROUP BY

EXPLAIN SELECT views.view_id, array_agg( im.image_id ) as agg
FROM views 
LEFT JOIN view_images vim USING (view_id)
LEFT JOIN images im USING (image_id)
WHERE views.alias = 1234
GROUP BY views.view_id;


解决方案#2:使用LATERAL来显式移动有问题的连接 predicate

EXPLAIN SELECT * FROM views LEFT JOIN LATERAL (
    SELECT vim.view_id, array_agg( im.image_id ) as agg
    FROM view_images vim
    JOIN images im USING (image_id)
    WHERE vim.view_id=views.view_id
    GROUP BY vim.view_id
) vimages USING (view_id)
WHERE views.alias = 1234;


两者都可以通过对图像的索引扫描和view_images只命中它们应该命中的行来获得快速计划。
如果views.alias的条件被删除,整个表实际上需要被扫描,#1将恢复到seq扫描,这在这种情况下更好,但是#2将继续运行嵌套循环,所以它会更慢。所以#1将是我的首选选项,除非在粘贴到问题中之前有一些其他的查询被删除,并且会妨碍GROUP BY。
解决方案3:
如果你想要多个CTE,比如“vimages”,但是在其他表上,那么你可以先用你想要的view_id创建一个实体化的CTE。然后手动将它作为一个联接放入有问题的CTE中。这会得到一个快速的计划:

with vids AS MATERIALIZED (SELECT view_id FROM views v WHERE v.alias = 1234),
vimages as (
    SELECT
        vim.view_id,
        array_to_json(array_agg(row_to_json( row(            
            vim.vim_foo,
            im.image_id, 
            im.image_foo
        )) order by vim.vim_foo desc)) as json
    FROM 
    vids JOIN view_images vim USING (view_id)
    JOIN images im USING (image_id)
    GROUP BY vim.view_id
)
select vimages.*
from vids v
left join vimages USING (view_id);

相关问题