Total duration time without overlapping time in SQL Server

klh5stk1  于 2023-06-21  发布在  SQL Server
关注(0)|答案(7)|浏览(116)

I am using SQL Server 2017.

I want to solve how to know duration time without overlapping time. refer the below code.

data:

========================================================
Group  | FromDate              |   ToDate
  1    | 2019-09-30 11:13:00   | 2019-09-30 11:13:50 
  1    | 2019-09-30 11:13:20   | 2019-09-30 11:14:10 
  2    | 2019-09-30 11:20:00   | 2019-09-30 11:20:20 
  1    | 2019-09-30 11:20:10   | 2019-09-30 11:20:20 
  3    | 2019-09-30 11:25:00   | 2019-09-30 11:25:30 
=========================================================

result (second):

========================
Group  | DurationTime
   1   | 80
   2   | 60
   3   | 30 
=========================

Already I solved the calculation of the total duration time using DATEDIFF .

But, I missed the overlapping time like group 1 data.

group  | FromDate              |   ToDate
  1    | 2019-09-30 11:13:00   | 2019-09-30 11:13:50 
  1    | 2019-09-30 11:13:20   | 2019-09-30 11:14:10

How can I calculate the duration time without overlapping time?

It is difficult to make queries easily and with good performance.

pes8fvy9

pes8fvy91#

SQL implementation of Marzullo's algorithm: https://en.wikipedia.org/wiki/Marzullo%27s_algorithm .

declare @tbl table ([Group] tinyint, FromDate datetime2(0), ToDate datetime2(0));
insert into @tbl values
(1, '2019-09-30 11:13:00', '2019-09-30 11:13:50'),
(1, '2019-09-30 11:13:20', '2019-09-30 11:14:10'),
(2, '2019-09-30 11:20:00', '2019-09-30 11:20:20'),
(1, '2019-09-30 11:20:10', '2019-09-30 11:20:20'),
(3, '2019-09-30 11:25:00', '2019-09-30 11:25:30'),
(4, '2019-10-01 23:59:30', '2019-10-02 00:00:30'), 
(4, '2019-09-30 10:00:00', '2019-09-30 10:01:00');

-- 5. Summary of the Duration per Group
select [group], sum(duration) as duration from (
--  -- 4. Calculate the duration
    select [group], datediff(second, min(dt), max(dt)) duration from (
--      -- 3. Make ranges
        select *, (row_number() over(partition by [group] order by dt)-1) / 2 rn from (
            -- 2. Cumulative summary
            select *, sum([index]) over(partition by [group] order by dt rows between unbounded preceding and current row) cumul 
            from (
                -- 1. Unpivot
                select [group], fromdate as dt, 1 as [index]  from @tbl
                union all
                select [group], todate, -1 from @tbl
            )s
        )s
        where ([index]=1 and cumul=1) OR ([index]=-1 and cumul=0)
    )s
    group by [group], rn
)s
group by [group]

The outcome:

Group    Duration
-----    -----
1        80
2        20
3        30
4        120

The idea is simple:

  1. Take all the dates (FromDate and ToDate) into a single column by adding a second column with an index 1 if it is FromDate, -1 if it is ToDate.
  2. Cumulative summarise the index in order to find the start and the end of a range. index=1 and cumul=1 is a start, index=-1 and cumul=0 is end
  3. Group dates into pairs where a pair has start and end dates
  4. Calculate the duration of a range
  5. Summary of the durations per group
p4rjhz4m

p4rjhz4m2#

With reference to, and based on, SQL Queries for Overlapping Time Periods on SQL Server ...

When comparing two time periods T1 and T2, there are five possibilites:

  1. T1 and T2 are disjoint, they have no overlap.
  2. T1 completely encompasses T2.
  3. T2 completely encompasses T1.
  4. T1 overlaps the start of T2.
  5. T2 overlaps the start of T1.

This is compounded when T3 is introduced and can overlap any, all or none of T1 and T2.

Starting with your example data:

declare @Durations table (
    [Group] int not null,
    FromDate datetime not null,
    ToDate datetime not null
);
insert @Durations values
    (1, '2019-09-30 11:13:00', '2019-09-30 11:13:50'),
    (1, '2019-09-30 11:13:20', '2019-09-30 11:14:10'),
    (2, '2019-09-30 11:20:00', '2019-09-30 11:20:20'),
    (1, '2019-09-30 11:20:10', '2019-09-30 11:20:20'),
    (3, '2019-09-30 11:25:00', '2019-09-30 11:25:30');
select * from @Durations;
Group       FromDate                ToDate
----------- ----------------------- -----------------------
1           2019-09-30 11:13:00.000 2019-09-30 11:13:50.000
1           2019-09-30 11:13:20.000 2019-09-30 11:14:10.000
2           2019-09-30 11:20:00.000 2019-09-30 11:20:20.000
1           2019-09-30 11:20:10.000 2019-09-30 11:20:20.000
3           2019-09-30 11:25:00.000 2019-09-30 11:25:30.000

We can group the chronologies by identifying overlapping time periods, assigning overlapping periods to a single span of time...

;with Chronologies as (
    select [Group],
        FromDate,
        ToDate,
        Chronology = row_number() over (partition by [Group] order by FromDate, ToDate)
    from @Durations
), CTE as (
    select  [Group], FromDate, ToDate, Chronology, 1 as Span
    from Chronologies
    where Chronology = 1

    union all

    select  p2.[Group],
        p2.FromDate,
        p2.ToDate,
        p2.Chronology,
        Span = case when
              (p1.FromDate between p2.FromDate and p2.ToDate) or
              (p1.ToDate between p2.FromDate and p2.ToDate) or
              (p1.FromDate < p2.FromDate and p1.ToDate > p2.ToDate) or
              (p1.FromDate > p2.FromDate and p1.ToDate < p2.ToDate)
              then p1.Span else (1 + p1.Span) end
    from CTE p1
    inner join Chronologies p2 on p2.[Group]=p1.[Group] and p2.Chronology=(1 + p1.Chronology)
)
select *
from CTE
order by [Group], Chronology;
Group       FromDate                ToDate                  Chronology           Span
----------- ----------------------- ----------------------- -------------------- -----------
1           2019-09-30 11:13:00.000 2019-09-30 11:13:50.000 1                    1
1           2019-09-30 11:13:20.000 2019-09-30 11:14:10.000 2                    1
1           2019-09-30 11:20:10.000 2019-09-30 11:20:20.000 3                    2
2           2019-09-30 11:20:00.000 2019-09-30 11:20:20.000 1                    1
3           2019-09-30 11:25:00.000 2019-09-30 11:25:30.000 1                    1

We can use the Span column to coalesce a Group's time periods, i.e.: group by [Group], Span allows us to use min(FromDate) and max(ToDate) to calculate a given Span's time duration with datediff() and we can sum() those durations to arrive at your DurationTime result...

;with Chronologies as (
    select [Group],
        FromDate,
        ToDate,
        Chronology = row_number() over (partition by [Group] order by FromDate, ToDate)
    from @Durations
), CTE as (
    select  [Group], FromDate, ToDate, Chronology, 1 as Span
    from Chronologies
    where Chronology = 1

    union all

    select  p2.[Group],
        p2.FromDate,
        p2.ToDate,
        p2.Chronology,
        Span = case when
              (p1.FromDate between p2.FromDate and p2.ToDate) or
              (p1.ToDate between p2.FromDate and p2.ToDate) or
              (p1.FromDate < p2.FromDate and p1.ToDate > p2.ToDate) or
              (p1.FromDate > p2.FromDate and p1.ToDate < p2.ToDate)
              then p1.Span else (1 + p1.Span) end
    from CTE p1
    inner join Chronologies p2 on p2.[Group]=p1.[Group] and p2.Chronology=(1 + p1.Chronology)
)
select [Group], DurationTime = sum(datediff(second, FromDate, ToDate))
from (
    select  [Group], Span, FromDate=min(FromDate), ToDate=max(ToDate)
    from CTE
    group by [Group], Span
) Coalesced
group by [Group]
order by [Group];

Which gives us the final result:

Group       DurationTime
----------- ------------
1           80
2           20
3           30
qncylg1j

qncylg1j3#

You can try method called "gaps and islands":

declare @tbl table ([Group] int, FromDate datetime, ToDate datetime);
insert into @tbl values
(1,'2019-09-30 11:13:00','2019-09-30 11:13:50'), 
(1,'2019-09-30 11:13:20','2019-09-30 11:14:10'), 
(2,'2019-09-30 11:20:00','2019-09-30 11:20:20'), 
(1,'2019-09-30 11:20:10','2019-09-30 11:20:20'), 
(3,'2019-09-30 11:25:00','2019-09-30 11:25:30');

select [Group], sum(sec) from (
    select [Group], datediff(second, min(FromDate), max(ToDate)) sec
    from (
        select *,
               ROW_NUMBER() over (order by FromDate) -
                 ROW_NUMBER() over (partition by [Group] order by FromDate) grp
        from @tbl
    ) a group by [Group], grp
) a group by [Group]

Result:

jecbmhm3

jecbmhm34#

SELECT [GROUP]
    ,sum(CONVERT(INTEGER, REPLACE(CONVERT(VARCHAR, TODATE, 24), ':', '')) - CONVERT(INTEGER, REPLACE(CONVERT(VARCHAR, FROMDATE, 24), ':', ''))) as DurationTime
FROM GROUPS
GROUP BY [GROUP]
hc2pp10m

hc2pp10m5#

I would approach this as a gaps-and-islands problem. You can identify the "islands" by doing the following:

  • Determine where an "island" starts, which would would be a gap from all previous records.
  • Do a cumulative sum of the starts to identify a grouping.
  • Aggregate.

You can do this using window functions:

select groupid, min(fromdate), max(todate)
from (select t.*,
             sum(case when todate > prev_fromdate then 0 else 1 end) over
                 (partition by groupid order by fromdate) as grp
      from (select t.*,
                   max(todate) over (partition by groupid
                                     order by fromdate
                                     rows between unbounded preceding and 1 preceding
                                    ) as prev_fromdate
            from t
           ) t
     ) t
group by groupid, grp;

You can then aggregate this by groupid :

select groupid, sum(datediff(second, fromdate, todate))
from (select groupid, min(fromdate) as fromdate, max(todate) as todate
      from (select t.*,
                   sum(case when todate > prev_fromdate then 0 else 1 end) over
                       (partition by groupid order by fromdate) as grp
            from (select t.*,
                         max(todate) over (partition by groupid
                                           order by fromdate
                                           rows between unbounded preceding and 1 preceding
                                          ) as prev_fromdate
                  from t
                 ) t
           ) t
      group by groupid, grp
     ) t
group by groupid;

Here is a db<>fiddle.

vmjh9lq9

vmjh9lq96#

I elaborated on the answer by Tomas J. here https://stackoverflow.com/a/58133814/11317556 and adjusted the algorithm to work with SNOWFLAKE DB.. here's the code:

CREATE temporary  TABLE tbl  ( "Group_by_attributes" int, FromDate datetime , ToDate datetime );
   
insert into tbl values
(1, '2019-09-30 11:13:00', '2019-09-30 11:13:50'),
(1, '2019-09-30 11:13:05', '2019-09-30 11:13:50'),
(1, '2019-09-30 11:13:05', '2019-09-30 11:13:50'),
(1, '2019-09-30 11:13:05', '2019-09-30 11:13:50'),
(1, '2019-09-30 11:13:20', '2019-09-30 11:14:10'),
(2, '2019-09-30 11:20:00', '2019-09-30 11:20:20'),
(1, '2019-09-30 11:20:10', '2019-09-30 11:20:20'),
(3, '2019-09-30 11:25:00', '2019-09-30 11:25:30'),
(4, '2019-10-01 23:59:30', '2019-10-02 00:00:30'), 
(4, '2019-09-30 10:00:00', '2019-09-30 10:01:00');

-- 5. Summary of the Duration per Group
select "Group_by_attributes", sum(duration) as duration from (
--  -- 4. Calculate the duration
    select "Group_by_attributes", datediff(second, min(dt), max(dt)) duration   , min(dt), max(dt) 
    from (
--      -- 3. Make ranges
        select *, FLOOR( (row_number() over(partition by "Group_by_attributes" order by dt)-1) / 2 ) rn from (
            -- 2. Cumulative summary
            select *
                , sum( index_column ) over(partition by "Group_by_attributes" order by dt rows between unbounded preceding and current row) cumul  
            from (
                -- 1. Unpivot
                select "Group_by_attributes", fromdate as dt, 1 as index_column   from tbl
                union all
                select "Group_by_attributes", todate, -1 from  tbl
            )s
        )s
        where ( index_column =1 and cumul=1) OR ( index_column =-1 and cumul=0)
    )s
    group by "Group_by_attributes"  , rn
)s
group by "Group_by_attributes"

The results are same:

toiithl6

toiithl67#

Marzullo's algorithm ( https://stackoverflow.com/a/58133814/12130544 ) is the best one from my experience. Not only for calculating the duration of event, but especially for identification non-intersecting (non-overlapping) intervals of validity. Successfully used this algorithm to identify periods when SKU is in assortment of the store in the retail and to identify begin and end dates of mobile data session of the customer in telecommunications. And for many another scenarios. Strongly recommend.

相关问题