(*Note: I updated the values to provide for gaps in report date as not all of my actual data is daily. I thank the first two responses - those helped, but it didn't address accounting for gaps in reporting dates. My real need is determining the End Date as I can easily get the start with a case statement using Days with the Team value against the Report Date.)
I have searched through the various versions of this question, but I am looking for a way to get the start and end date derived from a single column (known as report date). My situation is that a project will move from one team to the next and maybe back to a team for repeat work. Each time it comes back to a team, the days with the team start over as well as the start date resets.
This is the data:
| Report Date | Project ID | Team ID | Days with Team |
| ------------ | ------------ | ------------ | ------------ |
| 6/1/2023 | 12345 | C89 | 1 |
| 6/2/2023 | 12345 | C89 | 2 |
| 6/3/2023 | 12345 | C89 | 3 |
| 6/4/2023 | 12345 | C89 | 4 |
| 6/5/2023 | 12345 | C89 | 5 |
| 6/6/2023 | 12345 | B11 | 1 |
| 6/7/2023 | 12345 | B11 | 2 |
| 6/8/2023 | 12345 | B11 | 3 |
| 6/9/2023 | 12345 | C89 | 1 |
| 6/10/2023 | 12345 | C89 | 2 |
| 6/15/2023 | 12345 | C89 | 1 |
| 6/16/2023 | 12345 | C89 | 2 |
| 6/17/2023 | 12345 | C89 | 3 |
| 7/1/2023 | 12345 | B11 | 1 |
| 7/2/2023 | 12345 | B11 | 2 |
The results I seek is this:
Project ID | Team ID | Start Date | End Date |
---|---|---|---|
12345 | C89 | 6/1/2023 | 6/5/2023 |
12345 | B11 | 6/6/2023 | 6/8/2023 |
12345 | C89 | 6/9/2023 | 6/10/2023 |
12345 | C89 | 6/15/2023 | 6/17/2023 |
12345 | B11 | 7/1/2023 | 7/2/2023 |
I have tried different examples I found on Stack Overflow, but I can't get to where the query recognizes a break in time and a restart:
SELECT *
ReportDate 'Report Date'
,ProjectID 'Project ID'
,TeamID 'Team ID'
,TeamDays 'Days with Team'
,StartDate 'Start Date'
,EndDate 'End Date'
FROM (
SELECT
ReportDate
,ProjectID
,TeamID
,TeamDays
,CAST(DATEADD(day, - TeamDays, DATEADD(D, 1,CAST(ReportDate AS DATE))) AS DATE) StartDate
,CAST(MAX(ReportDate AS DATE) EndDate
,ROW_NUMBER() OVER(PARTITION BY ProjectID, TeamID ORDER BY ReportDate) RN
FROM TEAM_PROJECTS
WHERE TeamDays = 1
GROUP BY
ReportDate
,ProjectID
,TeamID
,TeamDays) Projects
WHERE RN = 1
Any tips would be most appreciated.
2条答案
按热度按时间gojuced71#
here one way using window functions to detect the groups and then group by that:
fiddle
7uhlpewt2#
This is a bit of a gaps and islands problem, but simpler, i'm including an alternative method of doing it just for this question:
What it does is finding the least date for a project where team id differs, and then considers it's as the finished date. By then grouping the start and finish date, one can calculate the changes in the Team ID. A little hack is needed to calculate the end date, i'm simply substracting the next date with another team by one day.
If there's no change date, we assume that maximum startdate is the enddate.
Output:
| Project ID | Team Id | startdate | enddate |
| ------------ | ------------ | ------------ | ------------ |
| 12345 | C89 | 2023-06-01 | 2023-06-05 |
| 12345 | B11 | 2023-06-06 | 2023-06-08 |
| 12345 | C89 | 2023-06-09 | 2023-06-15 |
Edit another version with gaps:
This version follows the gaps and island solution and create a running grouping that watches for switches in the team, which is then used to group on to get the start / end date.
Output:
| proj | team | s | e |
| ------------ | ------------ | ------------ | ------------ |
| 12345 | C89 | 2023-06-01 | 2023-06-05 |
| 12345 | B11 | 2023-06-06 | 2023-06-08 |
| 12345 | C89 | 2023-06-09 | 2023-06-17 |
| 12345 | B11 | 2023-07-01 | 2023-07-02 |