I wrote a query which calculated the difference in days between the startdate
of a contract and the enddate
, given that no new contract was concluded.
In some cases, however, there is a new contract that directly follows the old one. Therefore, there is no gap in between.
My query is wrong in exactly these cases.
As an example here is the problem with ID
1
| id | year | startdate | enddate | drop_out | covered_days |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 1 | 2015 | 2015-03-01 | 2020-12-31 | (null) | (null) |
| 1 | 2016 | 2015-03-01 | 2020-12-31 | (null) | (null) |
| 1 | 2017 | 2015-03-01 | 2020-12-31 | (null) | (null) |
| 1 | 2018 | 2015-03-01 | 2020-12-31 | (null) | (null) |
| 1 | 2019 | 2015-03-01 | 2020-12-31 | (null) | (null) |
| 1 | 2020 | 2015-03-01 | 2020-12-31 | (null) | (null) |
| 1 | 2021 | 2021-01-01 | 2021-04-20 | 2021-04-20 | 109 |
Because there is no gap between the enddate
of the first contract and the startdate
of the second contract the calculation should be
SELECT DATEDIFF(DAY, '2015-03-01', '2021-04-20') AS covered_days
instead of
SELECT DATEDIFF(DAY, '2021-01-01', '2021-04-20') AS covered_days
Query:
WITH t AS (
SELECT
id,
enddate,
startdate,
LEAD(startdate) OVER (PARTITION BY id ORDER BY startdate) AS next_startdate
FROM
tab2
),
t_max AS (
SELECT
id,
MAX(enddate) AS max_enddate
FROM
tab2
GROUP BY
id
)
SELECT
tab1.id,
YEAR(CAST(DATEADD(YEAR, CAST(tab1.year AS INT) - 1900, '19000101') AS DATE)) AS year,
tab2.startdate,
tab2.enddate,
CASE
WHEN DATEDIFF(day, tab2.enddate, t.next_startdate) > 1
AND tab2.enddate <> '2222-01-01'
AND YEAR(tab2.enddate) = YEAR(CAST(DATEADD(YEAR, CAST(tab1.year AS INT) - 1900, '19000101') AS DATE)) THEN tab2.enddate
WHEN tab2.enddate = t_max.max_enddate
AND tab2.enddate <> '2222-01-01'
AND YEAR(tab2.enddate) = YEAR(CAST(DATEADD(YEAR, CAST(tab1.year AS INT) - 1900, '19000101') AS DATE)) THEN t_max.max_enddate
ELSE NULL
END AS drop_out,
DATEDIFF(DAY, tab2.startdate, CASE
WHEN DATEDIFF(DAY, tab2.enddate, t.next_startdate) > 1
AND tab2.enddate <> '2222-01-01'
AND YEAR(tab2.enddate) = YEAR(CAST(DATEADD(YEAR, CAST(tab1.year AS INT) - 1900, '19000101') AS DATE))
THEN tab2.enddate
WHEN tab2.enddate = t_max.max_enddate
AND tab2.enddate <> '2222-01-01'
AND YEAR(tab2.enddate) = YEAR(CAST(DATEADD(YEAR, CAST(tab1.year AS INT) - 1900, '19000101') AS DATE))
THEN t_max.max_enddate
ELSE NULL
END) AS days_insured
FROM
tab1
JOIN tab2
ON tab1.id = tab2.id
AND YEAR(tab2.startdate) <= YEAR(CAST(DATEADD(YEAR, CAST(tab1.year AS INT) - 1900, '19000101') AS DATE))
AND YEAR(tab2.enddate) >= YEAR(CAST(DATEADD(YEAR, CAST(tab1.year AS INT) - 1900, '19000101') AS DATE))
LEFT JOIN t
ON tab2.id = t.id
AND tab2.startdate >= t.startdate
AND tab2.startdate < t.next_startdate
LEFT JOIN t_max
ON tab2.id = t_max.id;
For data and query see: db<>fiddle
Expected Output:
| id | year | startdate | enddate | drop_out | covered_days |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 1 | 2015 | 2015-03-01 | 2020-12-31 | (null) | (null) |
| 1 | 2016 | 2015-03-01 | 2020-12-31 | (null) | (null) |
| 1 | 2017 | 2015-03-01 | 2020-12-31 | (null) | (null) |
| 1 | 2018 | 2015-03-01 | 2020-12-31 | (null) | (null) |
| 1 | 2019 | 2015-03-01 | 2020-12-31 | (null) | (null) |
| 1 | 2020 | 2015-03-01 | 2020-12-31 | (null) | (null) |
| 1 | 2021 | 2021-01-01 | 2021-04-20 | 2021-04-20 | 2242 |
| 3 | 2014 | 2014-03-01 | 2019-08-09 | (null) | (null) |
| 3 | 2015 | 2014-03-01 | 2019-08-09 | (null) | (null) |
| 3 | 2016 | 2014-03-01 | 2019-08-09 | (null) | (null) |
| 3 | 2017 | 2014-03-01 | 2019-08-09 | (null) | (null) |
| 3 | 2018 | 2014-03-01 | 2019-08-09 | (null) | (null) |
| 3 | 2019 | 2014-03-01 | 2019-08-09 | 2019-08-09 | 1987 |
| 3 | 2019 | 2019-08-12 | 2020-01-31 | (null) | (null) |
| 3 | 2020 | 2019-08-12 | 2020-01-31 | 2020-01-31 | 172 |
| 5 | 2015 | 2015-03-01 | 2015-03-31 | (null) | (null) |
| 5 | 2015 | 2015-04-01 | 2015-04-09 | 2015-04-09 | 8 |
| 5 | 2015 | 2015-04-11 | 2015-04-16 | 2015-04-16 | 5 |
| 5 | 2015 | 2015-04-18 | 2015-04-23 | 2015-04-23 | 5 |
| 5 | 2016 | 2016-06-01 | 2016-07-30 | (null) | (null) |
| 5 | 2016 | 2016-07-31 | 2017-02-03 | (null) | (null) |
| 5 | 2017 | 2016-07-31 | 2017-02-03 | (null) | (null) |
| 5 | 2017 | 2017-02-04 | 2017-09-13 | 2017-09-13 | 469 |
| 5 | 2017 | 2017-09-15 | 2017-09-17 | 2017-09-17 | 2 |
| 5 | 2017 | 2017-09-19 | 2019-04-08 | (null) | (null) |
| 5 | 2018 | 2017-09-19 | 2019-04-08 | (null) | (null) |
| 5 | 2019 | 2017-09-19 | 2019-04-08 | 2019-04-08 | 566 |
| 5 | 2019 | 2019-04-10 | 2019-04-26 | 2019-04-26 | 16 |
| 5 | 2019 | 2019-04-28 | 2020-12-31 | (null) | (null) |
| 5 | 2020 | 2019-04-28 | 2020-12-31 | 2020-12-31 | 613 |
2条答案
按热度按时间kqqjbcuj1#
Your expected data bugged a bit for me, but here's my solution:
I skipped year table and generate data on the fly.
It's pretty standard stuff otherwise, you check previous date to see if it's 1 day or more off, then you build a group field by using a rolling SUM, this allows you to find the islands.
Then by joining in the years, you can "explode" the start / enddate rows into each year's specific rows.
8mmmxcuj2#
https://dbfiddle.uk/RNalulfk?hide=2