postgresql 将时间范围划分为之前/之后的全天和剩余小时数范围

7nbnzgx9  于 2023-10-18  发布在  PostgreSQL
关注(0)|答案(3)|浏览(122)

在Postgres中,给定一个日期范围的集合,如:
| Jmeter ID|设备id|开始于|结束于|
| --|--|--|--|
| 米1| device1| 2020-01-02 10:30| 2025-01-02 14:00|
| 米2| device1| 2020-01-02 10:30| 2020-01-02 11:30|
| 米3| device1| 2020-01-02 10:30| 2020-01-03 11:30|
我想把范围分成:

  • 一整天的范围
  • 全天之前的小时/分钟范围
  • 全天后的小时/分钟范围
  • 如果范围已经适合一整天的桶,那么就别管它了

所以上面的结果是:
| Jmeter ID|设备id|开始于|结束于|话|
| --|--|--|--|--|
| 米1| device1| 2020-01-02 10:30| 2020-01-03 00:00|第一行开始时的小时数|
| 米1| device1| 2020-01-03 00:00| 2025-01-02 00:00:00|从第一排开始的全天|
| 米1| device1| 2025-01-02 00:00:00| 2025-01-02 14:00|第一行末尾的小时数|
| 米2| device1| 2020-01-02 10:30| 2020-01-02 11:30|第二排单独留下|
| 米3| device1| 2020-01-02 10:30| 2020-01-03 00:00|第三行开始时的小时数|
| 米3| device1| 2020-01-03 00:00| 2020-01-03 11:30|第三行末尾的小时数|
我已经写了一些工作,但它是相当复杂和丑陋的。
”有一个更简单的方法吗?**

表格和数据:

CREATE TABLE IF NOT EXISTS metering_ranges (
    metering_point_id text NOT NULL,
    device_id text NOT NULL,
    start_at timestamp with time zone NOT NULL,
    end_at timestamp with time zone NOT NULL
);

INSERT INTO metering_ranges( metering_point_id, device_id,  start_at, end_at)
VALUES 
    ('meter1', 'device1', '2020-01-02 10:30:00', '2025-01-02 14:00:00'),
    ('meter2', 'device1', '2020-01-02 10:30:00', '2020-01-02 11:30:00'),
    ('meter3', 'device1', '2020-01-02 10:30:00', '2020-01-03 11:30:00');

现有(复杂)解决方案

with 

ranges_with_whole_days as (
    SELECT
        metering_point_id,
        device_id,
        start_at,
        date_trunc('day', start_at) + interval '1 d' as start_at_next_whole_day,
        date_trunc('day', end_at) as end_at_whole_day,
        end_at
    FROM
        metering_ranges
),

ranges as (
    SELECT
        metering_point_id,
        device_id,
        start_at,
        CASE
            WHEN start_at_next_whole_day <= end_at_whole_day THEN start_at_next_whole_day ELSE NULL
        END as start_at_next_day,
        CASE 
            WHEN end_at_whole_day >= start_at_next_whole_day THEN end_at_whole_day ELSE NULL
        END as end_at_prev_day,
        end_at
    FROM
        ranges_with_whole_days
),

ranges_bucketed AS (
    -- get hours before whole day
    SELECT metering_point_id, device_id, start_at, start_at_next_day as end_at
    FROM ranges m
    WHERE start_at_next_day IS NOT NULL
    
    UNION
    
    -- get whole day period
    SELECT metering_point_id, device_id, start_at_next_day as start_at, end_at_prev_day as end_at
    FROM  ranges m
    WHERE start_at_next_day IS NOT NULL AND end_at_prev_day IS NOT NULL AND start_at_next_day != end_at_prev_day
    
    UNION
    
    -- get hours after whole day
    SELECT metering_point_id, device_id, end_at_prev_day as start_at, end_at
    FROM ranges m
    WHERE end_at_prev_day IS NOT NULL
    
    UNION
    
    -- get existing record if it fits within a day
    SELECT  metering_point_id,  device_id, start_at, end_at
    FROM ranges m
    WHERE start_at_next_day IS NULL AND end_at_prev_day IS NULL
) 

SELECT *
FROM ranges_bucketed
ORDER BY metering_point_id, device_id, start_at
hivapdat

hivapdat1#

假设数据类型为timestamp(使用示例数据,而不是显示timestamptz的矛盾表定义),否则,您必须做更多的工作。请参阅:

  • 在Rails和PostgreSQL中完全忽略时区

1.午夜拆分所有范围,保留全天合并

正如你的评论所表明的那样。(此外,与“meter3”的示例行一致。)

WITH cte AS (
   SELECT *
        , CASE WHEN start_at::time = '0:0' THEN start_at
               ELSE date_trunc('day', start_at) + interval '1 day' END AS start_day
        , date_trunc('day', end_at) AS end_day
   FROM   metering_ranges
   )
-- nothing to split
SELECT metering_point_id, device_id, start_at, end_at
FROM   cte
WHERE  end_day < start_day

-- split core days
UNION ALL
SELECT metering_point_id, device_id, start_day, end_day
FROM   cte
WHERE  end_day > start_day
-- add leading range
UNION ALL
SELECT metering_point_id, device_id, start_at, start_day
FROM   cte
WHERE  end_day >= start_day
AND    start_at::time > '0:0'
-- add trailing range
UNION ALL
SELECT metering_point_id, device_id, end_day, end_at
FROM   cte
WHERE  end_day >= start_day
AND    end_at::time > '0:0'
ORDER  BY 1,2,3;

fiddle
使用UNION ALL,而不是UNION。快
此外,您可以在最后将ORDER BY追加到UNION查询中。这适用于整个结果集。

2.仅对完整天数的范围进行分解

我最初的解释。(也许是更常见的用例?))

WITH cte AS (
   SELECT *
        , CASE WHEN start_at::time = '0:0' THEN start_at
               ELSE date_trunc('day', start_at) + interval '1 day' END AS start_day
        , date_trunc('day', end_at) AS end_day
   FROM   metering_ranges
   )
-- nothing to split
SELECT metering_point_id, device_id, start_at, end_at
FROM   cte
WHERE  end_day <= start_day
-- split core days
UNION ALL
SELECT metering_point_id, device_id, start_day, end_day
FROM   cte
WHERE  end_day > start_day
-- add leading range
UNION ALL
SELECT metering_point_id, device_id, start_at, start_day
FROM   cte
WHERE  end_day > start_day
AND    start_at::time <> '0:0'
-- add trailing range
UNION ALL
SELECT metering_point_id, device_id, end_day, end_at
FROM   cte
WHERE  end_day > start_day
AND    end_at::time <> '0:0'
ORDER  BY 1,2,3;

唯一的区别是:没有嵌套完整日期的范围仍在午夜拆分。即,从00:00之后开始到第二天24:00之前结束的范围。小提琴展示了它。
相关信息:

多范围类型的case 2. 变体

对于case 1.,这些方法不会立即起作用,因为只有在提取整天时才会分割范围。不适合“meter3”的示例行。
需要Postgres 14或更高版本。我们可以使用新的anymultirange - anymultirangeanymultirange运算符。手册:
计算多范围的差。
然后,我们可以unnest()得到的多值域,以获得剩余的分数。

WITH cte AS (
   SELECT *
        , CASE WHEN start_at::time = '0:0' THEN start_at
               ELSE date_trunc('day', start_at) + interval '1 day' END AS start_day
        , date_trunc('day', end_at) AS end_day
   FROM   metering_ranges
   )
-- nothing to split
SELECT metering_point_id, device_id, start_at, end_at
FROM   cte
WHERE  end_day <= start_day

-- nested full days to split
UNION ALL
SELECT metering_point_id, device_id, x.*
FROM   cte
CROSS  JOIN LATERAL (
   VALUES (start_day, end_day)
   UNION ALL
   SELECT lower(rest), upper(rest)
   FROM   unnest(tsrange(start_at, end_at)::tsmultirange
               - tsrange(start_day, end_day)::tsmultirange) rest
   ) x
WHERE  end_day > start_day
ORDER  BY 1,2,3;

或者更短,没有CTE,使用LEFT JOIN LATERAL

SELECT c.metering_point_id, c.device_id
     , COALESCE(x.start_at, c.start_at) AS start_at
     , COALESCE(x.end_at, c.end_at) AS end_at
FROM  (
   SELECT *
        , CASE WHEN start_at::time = '0:0' THEN start_at
               ELSE date_trunc('day', start_at) + interval '1 day' END AS start_day
        , date_trunc('day', end_at) AS end_day
   FROM   metering_ranges
   ) c
LEFT   JOIN  LATERAL (
   VALUES (start_day, end_day)
   UNION ALL
   SELECT lower(rest), upper(rest)
   FROM   unnest(tsrange(start_at, end_at)::tsmultirange
               - tsrange(start_day, end_day)::tsmultirange) rest
   WHERE  c.end_day > c.start_day
   ) x(start_at, end_at) ON c.end_day > c.start_day
ORDER  BY 1,2,3;

fiddle
但是构造、计算和解嵌套多范围会增加相当大的开销。不确定这种复杂程度是否能与更“蛮力”的查询竞争。

bkhjykvo

bkhjykvo2#

日期范围可以通过准备好的时间序列的left join(通过union all)添加回原始表。联接条件指定有效范围:

select distinct r.meter_id, r.device_id, coalesce(t.a, r.start_at), coalesce(t.b, r.end_at)
from metering_ranges r left join lateral (
   select r.meter_id id, r.start_at a, date(r.start_at) + make_interval(days => (r.end_at != date(r.end_at))::int) b
   union all
   select r.meter_id id, date(r.start_at) + make_interval(days => (r.end_at != date(r.end_at))::int) a, date(r.end_at) b
   union all
   select r.meter_id id, date(r.end_at) a, r.end_at
) t on greatest(t.a, t.b) <= r.end_at and t.a < t.b and t.a >= r.start_at 
order by r.meter_id

See fiddle

hec6srdp

hec6srdp3#

经过一点修修补补,想出了这个,并希望分享,但它可能是一个更丑陋的解决方案,虽然。

select *
from 
(
select metering_point_id, device_id, 
start_at, 
case when date(start_at)=date(end_at) then end_at else date(start_at+INTERVAL '1 day') end end_at
from  metering_ranges
union
select metering_point_id, device_id, 
case when date(start_at)=date(end_at) then null else date(start_at+INTERVAL '1 day') end, 
date(end_at)
from  metering_ranges
union
select metering_point_id, device_id, 
case when date(start_at)=date(end_at) then null else date(end_at) end, 
end_at
from  metering_ranges
) t 
where t.start_at is not null and t.start_at!=t.end_at
order by metering_point_id

相关问题