SQL Server Group dates using T-SQL filtering on additional flag

k4ymrczo  于 2023-08-02  发布在  其他
关注(0)|答案(2)|浏览(88)

I have the following dataset:

startDate    endDate     pointsAchieved
01/07/2013   30/06/2014  0
05/10/2014   30/06/2015  0
01/07/2015   30/06/2016  0
01/07/2016   30/06/2017  1
01/07/2017   30/06/2018  1  
30/06/2018   30/06/2019  0
10/12/2019   30/06/2020  0
07/06/2021   30/06/2022  1
01/07/2022   30/06/2023  0

I want to group the data only showing the first date of the points not achieved and last date of points not achieved then doing the same for the pointsAchieved and it must be sorted by startDate, here is an example of the required output:

startDate    endDate     pointsAchieved
01/07/2013   30/06/2016  0
01/07/2016   30/06/2018  1
01/07/2018   30/06/2020  0
07/06/2021   30/06/2022  1
01/07/2022   30/06/2023  0

Is this possible to do using T-SQL? The purpose of this grouping is to show the user they have a break in their enrolment into a CPD program on a certificate as they haven't achieved their points the previous year.

The resulting text on the certificate will look something like this:

Jul 2013 - Jun 2016 (Points not achieved)

Jul 2016 - Jun 2018 (Points achieved)

Jul 2018 - Jun 2020 (Points not achieved)

Jun 2021 - Jun 2022 (Points achieved)

Jul 2022 - Jun 2023 (Points not achieved)

I have tried the following T-SQL but obviously not correct:

select Min(startDate) as startDate, Max(endDate) as endDate, pointsAchieved
from cpd_enrolments
where userId=XXXX
group by pointsAchieved
order by startDate

As it produces this result:

startDate    endDate     pointsAchieved
01/07/2013   30/06/2023  0
01/07/2016   30/06/2022  1

Any help would be much appreciated :)

vs3odd8k

vs3odd8k1#

Creating test data as per your example:

create table #dataset
(
    startDate date,
    endDate date,
    pointsAchieved int
)

insert into #dataset values
('2013-07-01','2014-06-30',0),
('2014-10-05','2015-06-30',0),
('2015-07-01','2016-06-30',0),
('2016-07-01','2017-06-30',1),
('2017-07-01','2018-06-30',1),
('2018-07-01','2019-06-30',0),
('2019-12-10','2020-06-30',0),
('2020-06-07','2022-06-30',1),
('2021-07-01','2023-06-30',0)

You can achieve the desired result with window functions:

with marked as
(
    select 
        pointsAchieved, 
        startDate, 
        endDate,
        row_number() over (order by startDate) - row_number() over (partition by pointsAchieved order by startDate) as grp
    from #dataset
)

select 
    min(startDate) as sectionStartDate, 
    max(endDate) as sectionEndDate,
    pointsAchieved
from marked
group by pointsAchieved, grp
order by min(startDate), max(endDate)

Output:

/----------------------------------------------------\
| sectionStartDate | sectionEndDate | pointsAchieved |
|------------------|----------------|----------------|
|    2013-07-01    |   2016-06-30   |        0       |
|    2016-07-01    |   2018-06-30   |        1       |
|    2018-07-01    |   2020-06-30   |        0       |
|    2020-06-07    |   2022-06-30   |        1       |
|    2021-07-01    |   2023-06-30   |        0       |
\----------------------------------------------------/
jyztefdp

jyztefdp2#

CREATE TABLE cpd_enrolments (
    startDate DATE,
    endDate DATE,
    pointsAchieved INT
);

INSERT INTO cpd_enrolments (startDate, endDate, pointsAchieved)
VALUES
('2013-07-01','2014-06-30',0),
('2014-10-05','2015-06-30',0),
('2015-07-01','2016-06-30',0),
('2016-07-01','2017-06-30',1),
('2017-07-01','2018-06-30',1),
('2018-07-01','2019-06-30',0),
('2019-12-10','2020-06-30',0),
('2020-06-07','2022-06-30',1),
('2021-07-01','2023-06-30',0)
9 rows affected
WITH grouped_data AS (
    SELECT startDate, endDate, pointsAchieved,
           ROW_NUMBER() OVER (ORDER BY startDate) AS rn,
           ROW_NUMBER() OVER (PARTITION BY pointsAchieved ORDER BY startDate) AS grp
    FROM (
        SELECT startDate, endDate, pointsAchieved,
               LAG(pointsAchieved) OVER (ORDER BY startDate) AS prevPointsAchieved
        FROM cpd_enrolments
        --WHERE userId = XXXX (If filtering by user ID is necessary)
    ) AS subquery
    WHERE pointsAchieved <> prevPointsAchieved OR prevPointsAchieved IS NULL
)
SELECT 
    MIN(startDate) AS sectionStartDate,
    MAX(endDate) AS sectionEndDate,
    pointsAchieved
FROM grouped_data
GROUP BY pointsAchieved, rn - grp
ORDER BY sectionStartDate;
sectionStartDatesectionEndDatepointsAchieved
2013-07-012014-06-300
2016-07-012017-06-301
2018-07-012019-06-300
2020-06-072022-06-301
2021-07-012023-06-300

fiddle

相关问题