postgresql 单个查询中不同组的排名最高的项目

ut6juiuv  于 2024-01-07  发布在  PostgreSQL
关注(0)|答案(3)|浏览(125)

我如何告诉PostgreSQL这一点?::
为组中的每个元素获取排名最高的结果,按您已经在索引中找到的这些属性进行排名,因此您不必对整个表进行排名。
下面是一个实际工作的SQL的实际例子,但是性能很差,因为它会在给我答案之前对整个表进行排名。
db<>fiddle:https://dbfiddle.uk/qCp6nt1q

CREATE TABLE job_queue (
  id SERIAL PRIMARY KEY,
  job_type VARCHAR,
  priority INT,
  created_at TIMESTAMP WITHOUT TIME ZONE
  -- Assume there are other columns here as well, specific
  -- to the job object, but I'm ommitting them since they
  -- are irrelevant to the problem at hand.
);

CREATE INDEX job_idx ON job_queue (job_type, priority, created_at);

INSERT
  INTO job_queue (id, job_type, priority, created_at) 
  VALUES 
    (1, 'j1', 1, '2000-01-01 00:00:00'),
    (2, 'j1', 1, '2000-01-02 00:00:00'),
    (3, 'j1', 2, '2000-01-01 00:00:00'),
    (4, 'j2', 1, '2000-01-01 00:00:00'),
    (5, 'j2', 1, '2000-01-02 00:00:00'),
    (6, 'j2', 2, '2000-01-01 00:00:00');

-- Give me the oldest highest-ranked job for each job type.
SELECT id FROM (
  SELECT
    id,
    ROW_NUMBER() OVER (
      PARTITION BY job_type 
      ORDER BY priority, created_at
    ) AS rank
  FROM job_queue
) AS ranked_jobs
WHERE rank = 1;

字符串
什么是与上面SQL中最后一个示例等效的查询,但在给出结果之前不会对整个表进行排名?我认为这应该是可能的,因为可以在索引中找到列,因此,理论上,数据库只需要为每个job_type获取索引中的第一个元组并将其提供给我,而不是对整个表进行排名。
它相当于:

SELECT id FROM job_queue WHERE job_type = 'j1' ORDER BY priority, created_at LIMIT 1;
SELECT id FROM job_queue WHERE job_type = 'j2' ORDER BY priority, created_at LIMIT 1;


但是一个单一的查询,将工作尽可能多的作业类型,因为我会,并返回一个单一的表与所有的id,我关心的。

ej83mcc0

ej83mcc01#

您可以使用递归CTE,本质上模仿跳过扫描。

WITH RECURSIVE cte AS (
    SELECT *
    FROM (
        SELECT
          jq.id,
          jq.job_type,
          jq.priority,
          jq.created_at
        FROM job_queue jq
        ORDER BY job_type, priority, created_at
        LIMIT 1
    ) jq

    UNION ALL
  
    SELECT
      jq.id,
      jq.job_type,
      jq.priority,
      jq.created_at
    FROM cte
    CROSS JOIN LATERAL (
        SELECT *
        FROM job_queue jq
        WHERE cte.job_type < jq.job_type
        ORDER BY job_type, priority, created_at
        LIMIT 1
    ) jq
)
SELECT *
FROM cte;

字符串
db<>fiddle

ibrsph3r

ibrsph3r2#

什么是一个查询,相当于上面的SQL中的最后一个例子,但不会排名整个表之前给我的结果?
这是使用多个CTE的另一个选项:

with cte1 as (
  select job_type, min(priority) as min_priority
  from job_queue
  group by job_type
),
cte2 as (
  select j.job_type, min(priority) as min_priority, min(created_at) as min_created_at
  from job_queue j
  inner join cte1 c on c.job_type = j.job_type 
                and c.min_priority = j.priority
  group by j.job_type
)
select j.*
from job_queue j
inner join cte2 c on c.job_type = j.job_type 
                  and c.min_created_at = j.created_at
                  and c.min_priority = j.priority;

字符串
Demo here

lg40wkob

lg40wkob3#

如果你有一个列出每个job_type的表,你可以使用横向连接:

select id from job_type cross join lateral 
    (select id from job_queue where job_queue.job_type=job_type.job_type order by priority, created_at limit 1);

字符串
如果没有,您可以使用非常难看的递归CTE来模拟index skip scan

相关问题