SQL Server Extracting a Start and End Date from a Single Date Column

hts6caw3  于 2023-08-02  发布在  其他
关注(0)|答案(2)|浏览(109)

(*Note: I updated the values to provide for gaps in report date as not all of my actual data is daily. I thank the first two responses - those helped, but it didn't address accounting for gaps in reporting dates. My real need is determining the End Date as I can easily get the start with a case statement using Days with the Team value against the Report Date.)

I have searched through the various versions of this question, but I am looking for a way to get the start and end date derived from a single column (known as report date). My situation is that a project will move from one team to the next and maybe back to a team for repeat work. Each time it comes back to a team, the days with the team start over as well as the start date resets.

This is the data:
| Report Date | Project ID | Team ID | Days with Team |
| ------------ | ------------ | ------------ | ------------ |
| 6/1/2023 | 12345 | C89 | 1 |
| 6/2/2023 | 12345 | C89 | 2 |
| 6/3/2023 | 12345 | C89 | 3 |
| 6/4/2023 | 12345 | C89 | 4 |
| 6/5/2023 | 12345 | C89 | 5 |
| 6/6/2023 | 12345 | B11 | 1 |
| 6/7/2023 | 12345 | B11 | 2 |
| 6/8/2023 | 12345 | B11 | 3 |
| 6/9/2023 | 12345 | C89 | 1 |
| 6/10/2023 | 12345 | C89 | 2 |
| 6/15/2023 | 12345 | C89 | 1 |
| 6/16/2023 | 12345 | C89 | 2 |
| 6/17/2023 | 12345 | C89 | 3 |
| 7/1/2023 | 12345 | B11 | 1 |
| 7/2/2023 | 12345 | B11 | 2 |

The results I seek is this:

Project IDTeam IDStart DateEnd Date
12345C896/1/20236/5/2023
12345B116/6/20236/8/2023
12345C896/9/20236/10/2023
12345C896/15/20236/17/2023
12345B117/1/20237/2/2023

I have tried different examples I found on Stack Overflow, but I can't get to where the query recognizes a break in time and a restart:

SELECT *

 ReportDate 'Report Date'
,ProjectID 'Project ID'
,TeamID 'Team ID'
,TeamDays 'Days with Team'
,StartDate 'Start Date'
,EndDate 'End Date'
FROM (
SELECT 
 ReportDate
,ProjectID
,TeamID
,TeamDays 
,CAST(DATEADD(day, - TeamDays, DATEADD(D, 1,CAST(ReportDate AS DATE))) AS DATE) StartDate
,CAST(MAX(ReportDate AS DATE) EndDate
,ROW_NUMBER() OVER(PARTITION BY ProjectID, TeamID ORDER BY ReportDate) RN

FROM TEAM_PROJECTS
WHERE TeamDays = 1
GROUP BY
 ReportDate
,ProjectID
,TeamID
,TeamDays) Projects
WHERE RN = 1

Any tips would be most appreciated.

gojuced7

gojuced71#

here one way using window functions to detect the groups and then group by that:

;WITH CTE AS (
    SELECT *
    , datediff(d, ReportDate, row_number() over (partition by ProjectID, TeamID order by ReportDate)) as grp
    FROM data
)
  
SELECT 
    ProjectID,
    TeamID,
    min(ReportDate) AS StartDate,
    max(ReportDate) AS EndDate
FROM CTE
GROUP BY ProjectID, TeamID, grp
ORDER BY StartDate
ProjectIDTeamIDStartDateEndDate
12345C892023-06-012023-06-05
12345B112023-06-062023-06-08
12345C892023-06-092023-06-10
12345C892023-06-152023-06-17
12345B112023-07-012023-07-02

fiddle

7uhlpewt

7uhlpewt2#

This is a bit of a gaps and islands problem, but simpler, i'm including an alternative method of doing it just for this question:

WITH data AS (
    SELECT  t.[Days WITH Team], t.[Project ID], t.[Team ID], cast(t.[report date] AS datetime) AS rep
    FROM
    (
        VALUES  (N'6/1/2023', 12345, N'C89', 1)
        ,   (N'6/2/2023', 12345, N'C89', 2)
        ,   (N'6/3/2023', 12345, N'C89', 3)
        ,   (N'6/4/2023', 12345, N'C89', 4)
        ,   (N'6/5/2023', 12345, N'C89', 5)
        ,   (N'6/6/2023', 12345, N'B11', 1)
        ,   (N'6/7/2023', 12345, N'B11', 2)
        ,   (N'6/8/2023', 12345, N'B11', 3)
        ,   (N'6/9/2023', 12345, N'C89', 1)
        ,   (N'6/10/2023', 12345, N'C89', 2)
        ,   (N'6/11/2023', 12345, N'C89', 3)
        ,   (N'6/12/2023', 12345, N'C89', 4)
        ,   (N'6/13/2023', 12345, N'C89', 5)
        ,   (N'6/14/2023', 12345, N'C89', 6)
        ,   (N'6/15/2023', 12345, N'C89', 7)
    ) t ([Report Date],[Project ID],[Team ID],[Days with Team])
)
SELECT  [Project ID], [Team Id], MIN(rep) startdate, ISNULL(nextdate -1, MAX(rep)) enddate
FROM    (
    SELECT  *, (SELECT MIN(d2.rep) FROM data d2 WHERE d2.[Project ID] = d.[Project ID] AND d2.rep > d.rep AND d2.[Team ID] <> d.[Team ID]) AS nextdate
    FROM    data d
    ) x
GROUP BY [Project ID], [Team Id], nextdate
ORDER BY startdate

What it does is finding the least date for a project where team id differs, and then considers it's as the finished date. By then grouping the start and finish date, one can calculate the changes in the Team ID. A little hack is needed to calculate the end date, i'm simply substracting the next date with another team by one day.

If there's no change date, we assume that maximum startdate is the enddate.

Output:
| Project ID | Team Id | startdate | enddate |
| ------------ | ------------ | ------------ | ------------ |
| 12345 | C89 | 2023-06-01 | 2023-06-05 |
| 12345 | B11 | 2023-06-06 | 2023-06-08 |
| 12345 | C89 | 2023-06-09 | 2023-06-15 |

Edit another version with gaps:

;with data as (
    select  cast([Report Date] as date) as rep,[Project ID] as proj,[Team ID] as team,[Days with Team] as days
    from (
    VALUES  (N'6/1/2023', 12345, N'C89', 1)
    ,   (N'6/2/2023', 12345, N'C89', 2)
    ,   (N'6/3/2023', 12345, N'C89', 3)
    ,   (N'6/4/2023', 12345, N'C89', 4)
    ,   (N'6/5/2023', 12345, N'C89', 5)
    ,   (N'6/6/2023', 12345, N'B11', 1)
    ,   (N'6/7/2023', 12345, N'B11', 2)
    ,   (N'6/8/2023', 12345, N'B11', 3)
    ,   (N'6/9/2023', 12345, N'C89', 1)
    ,   (N'6/10/2023', 12345, N'C89', 2)
    ,   (N'6/15/2023', 12345, N'C89', 1)
    ,   (N'6/16/2023', 12345, N'C89', 2)
    ,   (N'6/17/2023', 12345, N'C89', 3)
    ,   (N'7/1/2023', 12345, N'B11', 1)
    ,   (N'7/2/2023', 12345, N'B11', 2)
) t ([Report Date],[Project ID],[Team ID],[Days with Team])
)
select  proj, team, MIN(rep) as s, max(rep) as e
FROM (
    select  sum(prevTeam) over(partition by proj order by rep) as grouping
    ,   *
    from (
        select  case when lag(team) over(partition by proj order by rep) <> team then 1 else 0 end as prevTeam
        , *
        from    data
        ) x
    ) x
group by proj, team, x.grouping

This version follows the gaps and island solution and create a running grouping that watches for switches in the team, which is then used to group on to get the start / end date.

Output:
| proj | team | s | e |
| ------------ | ------------ | ------------ | ------------ |
| 12345 | C89 | 2023-06-01 | 2023-06-05 |
| 12345 | B11 | 2023-06-06 | 2023-06-08 |
| 12345 | C89 | 2023-06-09 | 2023-06-17 |
| 12345 | B11 | 2023-07-01 | 2023-07-02 |

相关问题