postgresql 查询以查找每个组中的第二大值

mklgxw1f  于 2023-03-01  发布在  PostgreSQL
关注(0)|答案(4)|浏览(234)

我有三张table:

  1. project: project_id, project_name
  2. milestone: milestone_id, milestone_name
  3. project_milestone: id, project_id, milestone_id, completed_date
    我想从按project_id分组的project_milestone中获取第二高的completed_date和milestone_id。也就是说,我想获取每个项目的第二高的completed_date的milestone_id。对此,正确的查询是什么?
gojuced7

gojuced71#

我认为您可以对project_milestone表和row_number()执行任何操作:

select pm.*
from (select pm.*,
             row_number() over (partition by project_id order by completed_date desc) as seqnum
      from project_milestone pm
      where pm.completed_date is not null
     ) pm
where seqnum = 2;

如果您需要包含 * 所有 * 项目,甚至是那些没有两个里程碑的项目,您可以使用left join

select p.project_id, pm.milestone_id, pm.completed_date
from projects p left join
     (select pm.*,
             row_number() over (partition by project_id order by completed_date desc) as seqnum
      from project_milestone pm
      where pm.completed_date is not null
     ) pm
     on p.project_id = pm.project_id and pm.seqnum = 2;
czq61nw1

czq61nw12#

使用LATERAL(PG 9.3+)可以产生比窗口函数版本更好的性能。

SELECT * FROM project;
 project_id | project_name 
------------+--------------
          1 | Project A
          2 | Project B

SELECT * FROM project_milestone;
 id | project_id | milestone_id |     completed_date     
----+------------+--------------+------------------------
  1 |          1 |            1 | 2000-01-01 00:00:00+01
  2 |          1 |            2 | 2000-01-02 00:00:00+01
  3 |          1 |            5 | 2000-01-03 00:00:00+01
  4 |          1 |            6 | 2000-01-04 00:00:00+01
  5 |          2 |            3 | 2000-02-01 00:00:00+01
  6 |          2 |            4 | 2000-02-02 00:00:00+01
  7 |          2 |            7 | 2000-02-03 00:00:00+01
  8 |          2 |            8 | 2000-02-04 00:00:00+01

SELECT *
FROM project p
CROSS JOIN LATERAL (
    SELECT milestone_id, completed_date
    FROM project_milestone pm
    WHERE pm.project_id = p.project_id
    ORDER BY completed_date ASC
    LIMIT 1
    OFFSET 1
) second_highest;
 project_id | project_name | milestone_id |     completed_date     
------------+--------------+--------------+------------------------
          1 | Project A    |            2 | 2000-01-02 00:00:00+01
          2 | Project B    |            4 | 2000-02-02 00:00:00+01
x8goxv8g

x8goxv8g3#

实现这一点的最简单方法是使用窗口函数。

SELECT *, nth_value(completed_date,2)
OVER (
    PARTITION BY project_id ORDER BY completed_date DESC
    RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
)
AS date2
FROM project_milestone;
kx5bkwkv

kx5bkwkv4#

您可以使用公用表表达式(CTE)来实现这一点。

WITH cte_data AS(
   SELECT *,
   ROW_NUMBER() OVER (PARTITION BY project_id ORDER BY completed_date DESC) 
   AS date_rank
   FROM project_milestone
   WHERE completed_date is not null
   )
SELECT * FROM cte_data
where date_rank = 2;

相关问题