postgresql 在postgres的多个orderby参数中获取用于排序的列

qzlgjiam  于 2024-01-07  发布在  PostgreSQL
关注(0)|答案(1)|浏览(163)

我使用下面的查询从表schedule创建一个示例化视图schedule_state

CREATE MATERIALIZED VIEW schedule_state AS (
    WITH schedule_latest_events AS (
        SELECT
            *,
            row_number() over (
                PARTITION BY key_id
                ORDER BY
                    header_event_timestamp DESC,
                    ktimestamp DESC,
                    raw_load_timestamp DESC,
                    update_timestamp DESC
            ) AS row_number
        FROM
            schedule
    )
    SELECT
        *
    FROM
        schedule_latest_events
    WHERE
        row_number = 1
);

字符串
由于我使用多个列进行排序,有没有办法可以找出秩为1的列使用了哪一列进行排序?
另外,是否可以得到最多四个order by列,并将结果分配给物化视图中的一列?谢谢!

Table schedule

key_id,   header_event_timestamp,              ktimestamp,             raw_load_timestamp,         update_timestamp
k1,    2023-12-22 08:50:59.930000,  2023-12-22 08:50:59.930000, 2023-12-22 08:52:36.960000, 2023-12-22 08:50:58.100000
k1,    2023-12-22 08:50:37.530000,  2023-12-22 08:50:37.530000, 2023-12-22 08:52:36.960000, 2023-12-22 06:41:02.483000
k2,    2023-12-22 06:41:03.080000,  2023-12-22 06:41:03.080000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000
k2,    2023-12-22 06:41:03.080000,  2023-12-22 06:45:03.060000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000
k2,    2023-12-22 06:41:03.080000,  2023-12-20 15:09:36.370000, 2023-12-21 02:52:37.557000, 2023-12-20 14:04:37.323000
K3,    2023-12-20 14:39:00.909832,  2023-12-20 14:39:00.909000, 2023-12-21 02:52:37.557000, 2023-12-20 14:04:37.323000
K3,    2023-12-20 14:39:00.909832,  2023-12-20 14:39:00.909000, 2023-12-21 02:55:37.557000, 2023-12-20 14:04:37.323000
k4,    2023-12-22 06:41:03.080000,  2023-12-22 06:41:03.080000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000
k4,    2023-12-22 06:41:03.080000,  2023-12-22 06:41:03.080000, 2023-12-22 05:52:33.189000, 2023-12-22 06:41:02.483000
k4,    2023-12-22 06:41:03.080000,  2023-12-22 05:41:03.080000, 2023-12-22 05:52:33.189000, 2023-12-22 06:41:02.483000
k4,    2023-12-22 05:41:03.080000,  2023-12-22 05:41:03.080000, 2023-12-22 05:52:33.189000, 2023-12-22 06:41:02.483000

MATERIALIZED VIEW schedule_state

key_id,   header_event_timestamp,              ktimestamp,             raw_load_timestamp,         update_timestamp         row_number
k1,    2023-12-22 08:50:59.930000,  2023-12-22 08:50:59.930000, 2023-12-22 08:52:36.960000, 2023-12-22 08:50:58.100000          1
k2,    2023-12-22 06:41:03.080000,  2023-12-22 06:45:03.060000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000          1
K3,    2023-12-20 14:39:00.909832,  2023-12-20 14:39:00.909000, 2023-12-21 02:55:37.557000, 2023-12-20 14:04:37.323000          1
k4,    2023-12-22 06:41:03.080000,  2023-12-22 06:41:03.080000, 2023-12-22 06:52:33.189000, 2023-12-22 06:41:02.483000          1


表中

  • 对于key_id k1,第一个参数header_event_timestamp不同,因此具有最新header_event_timestamp的记录被分配等级1。
  • 对于key_id k2,第一参数header_event_timestamp是相同的,因此基于第二参数ktimestamp进行排序,因为这对于key k2是不同的,并且具有最新ktimestamp的记录被分配等级1。
  • 对于key_id k3,第一参数header_event_timestamp和第二参数ktimestamp是相同的,因此基于第三参数raw_load_timestamp进行排序,因为这对于key k3是不同的,并且具有最新raw_load_timestamp的记录被分配等级1。
q35jwt9p

q35jwt9p1#

可以在定义顺序的几列中确定第一列,该第一列区分前2个记录。
我已经准备了下面的完整示例,我立即评论:

  • 为了便于说明,我添加了一个Schedule CTE,您需要自己删除它才能得到真实的的查询。
  • 有1个记录具有key_id = ksingle,以说明当记录是单独的时会发生什么。
  • key_id = kequal的两个记录是相同的,以说明在完美匹配的情况下会发生什么。
  • 这个查询使用了两个类似的窗口。我在查询的WINDOW子句中移动了它们的定义,以避免重复+为了清楚起见,让它们彼此相邻。

关于窗口w2,正如我用ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING定义的那样,如果要删除最终过滤器(WHERE row_number = 1),则最后将比较所有记录与它们各自的下一个记录。这是具有从当前行开始的范围的效果(行#2总是下一行),而不是例如UNBOUNDED PRECEDING(对于该情况,行#2是整个分区组的实际第二行)。

  • 下面的查询实现了两种方法来告诉您区分列:
  • 4列返回布尔值,用于比较订单中使用的每列与同一列的下一个值(窗口函数nth_value,带参数2)。
  • 由与上述相同的布尔值构造的数组,对该数组应用array_position,它返回false的第一个匹配项。
WITH Schedule(key_id, header_event_timestamp, ktimestamp, raw_load_timestamp, update_timestamp) AS ( VALUES
('ksingle', '2023-12-22 08:50:59.930000',  '2023-12-22 08:50:59.930000', '2023-12-22 08:52:36.960000', '2023-12-22 08:50:58.100000'),
('kequal',  '2023-12-22 08:50:59.930000',  '2023-12-22 08:50:59.930000', '2023-12-22 08:52:36.960000', '2023-12-22 08:50:58.100000'),
('kequal',  '2023-12-22 08:50:59.930000',  '2023-12-22 08:50:59.930000', '2023-12-22 08:52:36.960000', '2023-12-22 08:50:58.100000'),
('k1',      '2023-12-22 08:50:59.930000',  '2023-12-22 08:50:59.930000', '2023-12-22 08:52:36.960000', '2023-12-22 08:50:58.100000'),
('k1',      '2023-12-22 08:50:37.530000',  '2023-12-22 08:50:37.530000', '2023-12-22 08:52:36.960000', '2023-12-22 06:41:02.483000'),
('k2',      '2023-12-22 06:41:03.080000',  '2023-12-22 06:41:03.080000', '2023-12-22 06:52:33.189000', '2023-12-22 06:41:02.483000'),
('k2',      '2023-12-22 06:41:03.080000',  '2023-12-22 06:45:03.060000', '2023-12-22 06:52:33.189000', '2023-12-22 06:41:02.483000'),
('k2',      '2023-12-22 06:41:03.080000',  '2023-12-20 15:09:36.370000', '2023-12-21 02:52:37.557000', '2023-12-20 14:04:37.323000'),
('k3',      '2023-12-20 14:39:00.909832',  '2023-12-20 14:39:00.909000', '2023-12-21 02:52:37.557000', '2023-12-20 14:04:37.323000'),
('k3',      '2023-12-20 14:39:00.909832',  '2023-12-20 14:39:00.909000', '2023-12-21 02:55:37.557000', '2023-12-20 14:04:37.323000'),
('k4',      '2023-12-22 06:41:03.080000',  '2023-12-22 06:41:03.080000', '2023-12-22 06:52:33.189000', '2023-12-22 06:41:02.483000'),
('k4',      '2023-12-22 06:41:03.080000',  '2023-12-22 06:41:03.080000', '2023-12-22 05:52:33.189000', '2023-12-22 06:41:02.483000'),
('k4',      '2023-12-22 06:41:03.080000',  '2023-12-22 05:41:03.080000', '2023-12-22 05:52:33.189000', '2023-12-22 06:41:02.483000'),
('k4',      '2023-12-22 05:41:03.080000',  '2023-12-22 05:41:03.080000', '2023-12-22 05:52:33.189000', '2023-12-22 06:41:02.483000')
), schedule_latest_events AS (
    SELECT *,
           ROW_NUMBER()                         OVER w1 AS row_number,
           nth_value(header_event_timestamp, 2) OVER w2 AS next_header_event_timestamp,
           nth_value(ktimestamp            , 2) OVER w2 AS next_ktimestamp,
           nth_value(raw_load_timestamp    , 2) OVER w2 AS next_raw_load_timestamp,
           nth_value(update_timestamp      , 2) OVER w2 AS next_update_timestamp
    FROM Schedule
    WINDOW
    w1 AS (PARTITION BY key_id ORDER BY header_event_timestamp DESC, ktimestamp DESC, raw_load_timestamp DESC, update_timestamp DESC),
    w2 AS (PARTITION BY key_id ORDER BY header_event_timestamp DESC, ktimestamp DESC, raw_load_timestamp DESC, update_timestamp DESC ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
)
SELECT *,
       header_event_timestamp = next_header_event_timestamp AS test_header_event_timestamp,
       ktimestamp             = next_ktimestamp             AS test_ktimestamp,
       raw_load_timestamp     = next_raw_load_timestamp     AS test_raw_load_timestamp,
       update_timestamp       = next_update_timestamp       AS test_update_timestamp,
       array_position(ARRAY[
           header_event_timestamp = next_header_event_timestamp,
           ktimestamp             = next_ktimestamp,
           raw_load_timestamp     = next_raw_load_timestamp,
           update_timestamp       = next_update_timestamp
       ], 'false') AS first_different_column
FROM schedule_latest_events
WHERE row_number = 1

字符串
由于我所做的查询是为了说明表示您想要的结果的几种方法,因此在上面的查询将返回给您的内容中存在一些冗余。我允许您删除不想保留的列。
编辑:@MatBailie分享了上述查询的一个版本,该版本使用LEAD窗口函数而不是NTH_VALUE,重要的是,只有一个窗口。与上面的查询相比,这只是一个小的代码更改,结果没有任何差异。
在我找到一种方法来概括上面的查询之前(正如我在评论中解释的那样,我最初的尝试是看看我是否可以通过调用generate_series(2, ...)来做些什么)(不能保证我永远能够找到一个),他的版本是上级

相关问题