postgresql 单个查询中不同组的排名最高的项目

ut6juiuv 于 2024-01-07 发布在 PostgreSQL

关注(0)|答案(3)|浏览(125)

我如何告诉PostgreSQL这一点？：：
为组中的每个元素获取排名最高的结果，按您已经在索引中找到的这些属性进行排名，因此您不必对整个表进行排名。
下面是一个实际工作的SQL的实际例子，但是性能很差，因为它会在给我答案之前对整个表进行排名。
db<>fiddle：https://dbfiddle.uk/qCp6nt1q

CREATE TABLE job_queue (
  id SERIAL PRIMARY KEY,
  job_type VARCHAR,
  priority INT,
  created_at TIMESTAMP WITHOUT TIME ZONE
  -- Assume there are other columns here as well, specific
  -- to the job object, but I'm ommitting them since they
  -- are irrelevant to the problem at hand.
);

CREATE INDEX job_idx ON job_queue (job_type, priority, created_at);

INSERT
  INTO job_queue (id, job_type, priority, created_at) 
  VALUES 
    (1, 'j1', 1, '2000-01-01 00:00:00'),
    (2, 'j1', 1, '2000-01-02 00:00:00'),
    (3, 'j1', 2, '2000-01-01 00:00:00'),
    (4, 'j2', 1, '2000-01-01 00:00:00'),
    (5, 'j2', 1, '2000-01-02 00:00:00'),
    (6, 'j2', 2, '2000-01-01 00:00:00');

-- Give me the oldest highest-ranked job for each job type.
SELECT id FROM (
  SELECT
    id,
    ROW_NUMBER() OVER (
      PARTITION BY job_type 
      ORDER BY priority, created_at
    ) AS rank
  FROM job_queue
) AS ranked_jobs
WHERE rank = 1;

字符串
什么是与上面SQL中最后一个示例等效的查询，但在给出结果之前不会对整个表进行排名？我认为这应该是可能的，因为可以在索引中找到列，因此，理论上，数据库只需要为每个job_type获取索引中的第一个元组并将其提供给我，而不是对整个表进行排名。
它相当于：

SELECT id FROM job_queue WHERE job_type = 'j1' ORDER BY priority, created_at LIMIT 1;
SELECT id FROM job_queue WHERE job_type = 'j2' ORDER BY priority, created_at LIMIT 1;

型
但是一个单一的查询，将工作尽可能多的作业类型，因为我会，并返回一个单一的表与所有的id，我关心的。

postgresql

来源：https://stackoverflow.com/questions/77703141/highest-ranked-items-for-different-groups-in-single-query

3条答案

按热度按时间

ej83mcc01#

您可以使用递归CTE，本质上模仿跳过扫描。

WITH RECURSIVE cte AS (
    SELECT *
    FROM (
        SELECT
          jq.id,
          jq.job_type,
          jq.priority,
          jq.created_at
        FROM job_queue jq
        ORDER BY job_type, priority, created_at
        LIMIT 1
    ) jq

    UNION ALL
  
    SELECT
      jq.id,
      jq.job_type,
      jq.priority,
      jq.created_at
    FROM cte
    CROSS JOIN LATERAL (
        SELECT *
        FROM job_queue jq
        WHERE cte.job_type < jq.job_type
        ORDER BY job_type, priority, created_at
        LIMIT 1
    ) jq
)
SELECT *
FROM cte;

字符串
db<>fiddle

赞(0）回复(0）举报 2024-01-07

ibrsph3r2#

什么是一个查询，相当于上面的SQL中的最后一个例子，但不会排名整个表之前给我的结果？
这是使用多个CTE的另一个选项：

with cte1 as (
  select job_type, min(priority) as min_priority
  from job_queue
  group by job_type
),
cte2 as (
  select j.job_type, min(priority) as min_priority, min(created_at) as min_created_at
  from job_queue j
  inner join cte1 c on c.job_type = j.job_type 
                and c.min_priority = j.priority
  group by j.job_type
)
select j.*
from job_queue j
inner join cte2 c on c.job_type = j.job_type 
                  and c.min_created_at = j.created_at
                  and c.min_priority = j.priority;

字符串
Demo here

赞(0）回复(0）举报 2024-01-07

lg40wkob3#

如果你有一个列出每个job_type的表，你可以使用横向连接：

select id from job_type cross join lateral 
    (select id from job_queue where job_queue.job_type=job_type.job_type order by priority, created_at limit 1);

字符串
如果没有，您可以使用非常难看的递归CTE来模拟index skip scan，

赞(0）回复(0）举报 2024-01-07

我来回答

postgresql 单个查询中不同组的排名最高的项目

3条答案

相关问题

热门标签

最新问答