SQL Server SQL window function to compare values by date

j1dl9f46  于 2023-06-04  发布在  其他
关注(0)|答案(3)|浏览(179)

I have a table with a Date column and a value column.
| Date | Value |
| ------------ | ------------ |
| 2/28/2023 | 120 |
| 1/31/2023 | 127.2 |
| 1/1/2023 | 100 |
| 4/5/2022 | 110 |

I want to create two more columns that compare each value based on dates (30 day change and 60 day change)

Here is the output:
| Date | Value | Change in last 30 days | Change in last 60 days |
| ------------ | ------------ | ------------ | ------------ |
| 2/28/2023 | 120 | -6% | 20% |
| 1/31/2023 | 127.2 | 27% | |
| 1/1/2023 | 100 | | |
| 4/5/2022 | 110 | | |

So, the output table compares value on 2/28/2023 with the value on 1/31/2023 (within 30 days) and 1/1/2023 (60 days). We compare 1/31/2023 value with 1/1/2023 value which is within 30 days. How can I do this in SQL?

Thank you.

hec6srdp

hec6srdp1#

I don't know if you can use a window function to select based on a relative date range, but you can use a pair of OUTER APPLY(SELECT TOP 1 ...) constructs to select appropriate 30-day and 60-day past values for the calculation.

One ambiguity is the definition of the 30-day and 60-day past value, if no exact 30- or 60-day past entry is present. Some options are:

  • Most recent value at least N days old.
  • Oldest value at most N days old.
  • Either of the above with additional limits on the range.

For the following, I chose a solution that seemed to match the OP's desired results.

Although result formatting is often best left to the presentation layer, I've used FORMAT( ..., 'P0') to display the results in a percent format. (This was adapted from zhiguang's answer.)

SELECT D.*
    , FORMAT(D.Value / NULLIF(D30.Value, 0) - 1, 'P0') AS [%Change in last 30 days]
    , FORMAT(D.Value / NULLIF(D60.Value, 0) - 1, 'P0') AS [%Change in last 60 days]
FROM Data D
OUTER APPLY (
    SELECT TOP 1 D1.Value
    FROM Data D1
    WHERE D1.Date < D.Date
    AND D1.Date >= DATEADD(day, -30, D.Date)
    ORDER BY D1.Date
) D30
OUTER APPLY (
    SELECT TOP 1 D2.Value
    FROM Data D2
    WHERE D2.Date < DATEADD(day, -30, D.Date)
    AND D2.Date >= DATEADD(day, -60, D.Date)
    ORDER BY D2.Date
) D60

An OUTER APPLY is like a left join to a subselect that can include references back to earlier parts of the main query. (A CROSS APPLY is similar, but behaves like an inner join.) The OUTER APPLY is used here to allow for no match conditions.

The above also uses the NULLIF() function to protect against potential divide-by-zero errors.

See this db<>fiddle for a working demo that also includes additional data having multiple values within the same 30-day window.

Note that some of the selected past values might not seem obviously correct until you recognize that some months have lengths other than 30-days. Changing the date range calculations to use month offsets instead of day offsets might make the results more intuitive.

Sample results:
| Date | Value | %Change in last 30 days | %Change in last 60 days | D30_Value | D60_Value |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 2023-02-28 | 120.00 | -6% | 20% | 127.20 | 100.00 |
| 2023-01-31 | 127.20 | 27% | null | 100.00 | null |
| 2023-01-01 | 100.00 | null | null | null | null |
| 2022-04-05 | 110.00 | null | null | null | null |
| 2021-04-21 | 210.00 | 11% | 40% | 190.00 | 150.00 |
| 2021-04-11 | 200.00 | 11% | 43% | 180.00 | 140.00 |
| 2021-04-01 | 190.00 | 12% | 46% | 170.00 | 130.00 |
| 2021-03-21 | 180.00 | 20% | 50% | 150.00 | 120.00 |
| 2021-03-11 | 170.00 | 21% | 55% | 140.00 | 110.00 |
| 2021-03-01 | 160.00 | 23% | 60% | 130.00 | 100.00 |
| 2021-02-21 | 150.00 | 15% | 50% | 130.00 | 100.00 |
| 2021-02-11 | 140.00 | 17% | 40% | 120.00 | 100.00 |
| 2021-02-01 | 130.00 | 18% | 30% | 110.00 | 100.00 |
| 2021-01-21 | 120.00 | 20% | null | 100.00 | null |
| 2021-01-11 | 110.00 | 10% | null | 100.00 | null |
| 2021-01-01 | 100.00 | null | null | null | null |

I'll leave it to the OP to tweak the date range selection logic and the final result rounding and formatting.

1bqhqjot

1bqhqjot2#

Here's a pattern that might be useful. Collect the nearest values and then determine which you want to keep. If you have a cap in the number of possible lookback rows then you can use this method and adjust for whichever option you like.

In this example I'm basically assuming that there might be gaps but that there will only be a single value per 30 days. While that might not apply to your data it should be easy to incorporate the extra conditions:

with data as (
    select *,
        lag("date", 1) over (order by "date") as d1,
        lag("date", 2) over (order by "date") as d2,
        lag(value, 1)  over (order by "date") as v1,
        lag(value, 2)  over (order by "date") as v2
    from T
)
select *,
    (case when datediff(day, d1, "date") between  1 and 30 then v1 end - value) / value as "30daychange",
    (case when datediff(day, d1, "date") between 31 and 60 then v1
          when datediff(day, d2, "date") between 31 and 60 then v2 end - value) / value as "60daychange"
from data;

No joins are required and it only involves a sort on the date column so this will be very efficient.

yhived7q

yhived7q3#

Potentially you could also do something like the below

  • assumes at most one row per date in the source data
  • generates rows for any missing dates
  • Use FIRST_VALUE with IGNORE NULLS to find the value from the first date in the target range with data

Below uses some SQL Server 2022 specific functions so if you aren't on that version you would need to find alternatives.

WITH DateLimits AS
(
SELECT MIN(Date) AS MinDate, DATEDIFF(DAY, MIN(Date), MAX(Date)) AS RangeSize
FROM YourTable
), Expanded AS
(
SELECT  D.Date,
        Y.Value, 
        Within30 = FIRST_VALUE(Y.Value) IGNORE NULLS OVER (ORDER BY D.Date ROWS BETWEEN 30 PRECEDING AND 1 PRECEDING) , 
        Within60 = FIRST_VALUE(Y.Value) IGNORE NULLS OVER (ORDER BY D.Date ROWS BETWEEN 60 PRECEDING AND 31 PRECEDING) 
FROM DateLimits
CROSS APPLY GENERATE_SERIES(0, RangeSize) G
CROSS APPLY (SELECT DATEADD(DAY, G.value, MinDate)) D(Date) 
LEFT JOIN YourTable Y ON Y.Date = D.Date
)
SELECT Date, 
       Value,  
      (Value - Within30)/Within30 AS [Change in last 30 days],  
      (Value - Within60)/Within60 AS [Change in last 60 days]
FROM Expanded
WHERE Value IS NOT NULL
ORDER BY Date DESC

相关问题