postgresql WHERE子句使用CTE中的值比使用常量慢？

whhtz7ly 于 2023-08-04 发布在 PostgreSQL

关注(0)|答案(5)|浏览(147)

我希望在Postgres 12上执行查询期间缓存一个变量。我遵循了如下CTE的方法：

-- BEGIN PART 1
with cached_vars as (
    select max(datetime) as datetime_threshold
    from locations
    where distance > 70
      and user_id = 9087
)
-- END PART 1
-- BEGIN PART 2
select *
from locations
where user_id = 9087
  and datetime > (select datetime_threshold from cached_vars)
-- END PART 2

字符串
运行上述查询将导致性能问题。我预计总运行时间大约等于(part1 runtime + part2 runtime)，但它需要更长的时间。
值得注意的是，当我使用手动datetime_threshold只运行第二部分时，没有性能问题。
locations表定义为：

id | user_id | datetime | location | distance | ...
-----------------------------------------------------

型
有没有什么方法可以将总运行时间减少到(part1 runtime + part2 runtime)？

postgresql

来源：https://stackoverflow.com/questions/66625721/where-clause-is-slower-with-value-from-cte-than-with-constant

5条答案

按热度按时间

utugiqy61#

你观察到的差异背后的解释是这样的：
Postgres有列统计信息，可以根据datetime_threshold提供的常量的值调整查询计划。使用有利的过滤器值，这可以导致更高效的查询计划。
在另一种情况下，当datetime_threshold必须首先在另一个SELECT中计算时，Postgres必须默认为通用计划。datetime_threshold可以是任何东西。
在EXPLAIN输出中，差异将变得明显。
为了确保Postgres针对实际的datetime_threshold值优化第二部分，您可以运行两个单独的查询（将查询1的结果作为常量提供给查询2），或者使用动态SQL强制每次在PL/pgSQL函数中重新规划查询2。

例如

CREATE OR REPLACE FUNCTION foo(_user_id int, _distance int = 70)
  RETURNS SETOF locations
  LANGUAGE plpgsql AS
$func$
BEGIN
   RETURN QUERY EXECUTE 
     'SELECT *
      FROM   locations
      WHERE  user_id = $1
      AND    datetime > $2'
   USING _user_id
      , (SELECT max(datetime)
         FROM   locations
         WHERE  distance > _distance
         AND    user_id = _user_id);
END
$func$;

字符串
电话：

SELECT * FROM foo(9087);

型
相关信息：

plpgsql函数中的动态ORDER BY和ASC / DESC
PL/pgSQL函数中的可选参数

在极端情况下，您甚至可以使用另一个动态查询来计算datetime_threshold。但我觉得没这个必要。
至于“文档中有用的东西”：
[...]重要的区别是EXECUTE将在每次执行时重新计划命令，生成特定于当前参数值的计划;而PL/pgSQL可以以其他方式创建通用计划并将其高速缓存以供重用。在最佳计划强烈依赖于参数值的情况下，使用EXECUTE可以帮助确保不选择通用计划。
大胆强调我的。

索引

完美的索引应该是：

CREATE INDEX ON locations (user_id, distance DESC NULL LAST, date_time DESC NULLS LAST); -- for query 1
CREATE INDEX ON locations (user_id, date_time);           -- for query 2

型
微调取决于未公开的细节。部分索引可能是一个选项。
可能还有许多其他原因导致查询速度缓慢。细节不够。

赞(0）回复(0）举报 2023-08-04

tzxcd3kk2#

如果你想让你的查询表现良好，我建议添加索引locations(user_id, distance)和locations(user_id, datetime)。
我也会使用窗口函数来描述查询：

select l.*
from (select l.*,
             max(datetime) filter (where distance > 70) over (partition by userid) as datetime_threshold
      from location l
      where userid = 9087
     ) l
where datetime > datetime_threshold;

字符串
窗口功能通常可以提高性能。然而，如果有了正确的索引，我不知道这两个版本是否会有实质性的不同。

赞(0）回复(0）举报 2023-08-04

m0rkklqb3#

请将查询分为两部分，并将第一部分存储在临时表中（PostgreSQL中的临时表只能在当前数据库会话中访问）。然后将临时表与第二部分连接起来。希望能加快处理时间。

CREATE TEMPORARY TABLE temp_table_cached_vars (
       datetime_threshold timestamp
    );
    
    -- BEGIN PART 1
    with cached_vars as (
        select max(datetime) as datetime_threshold
        from locations
        where distance > 70
          and user_id = 9087
    )insert into temp_table_name select datetime_threshold from cached_vars 
    -- END PART 1
    -- BEGIN PART 2
    select *
    from locations
    where user_id = 9087
      and datetime > (select datetime_threshold from temp_table_cached_vars Limit 1)

-- END PART 2

字符串

赞(0）回复(0）举报 2023-08-04

omjgkv6w4#

只需在子查询中添加Limi1，就像我在下面的例子中使用的那样。

-- BEGIN PART 1
with cached_vars as (
    select max(datetime) as datetime_threshold
    from locations
    where distance > 70
      and user_id = 9087
)
-- END PART 1
-- BEGIN PART 2
select *
from locations
where user_id = 9087
  and datetime > (select datetime_threshold from cached_vars Limit 1)
-- END PART 2

字符串

赞(0）回复(0）举报 2023-08-04

8ehkhllq5#

试着把它改成