SQL Server Calculate rolling average of a measure based on certain condition from another column

5vf7fwbs  于 2023-03-17  发布在  其他
关注(0)|答案(1)|浏览(120)

Below is the given table.
| date | id1 | id2 | fact | boolean |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 2022-01-01 | i1 | l1 | 100 | 0 |
| 2022-01-02 | i1 | l1 | 10 | 1 |
| 2022-01-03 | i1 | l1 | 110 | 0 |
| 2022-01-04 | i1 | l1 | 70 | 0 |
| 2022-01-05 | i1 | l1 | 20 | 1 |
| 2022-01-05 | i1 | l1 | 30 | 1 |
| 2022-01-06 | i1 | l1 | 40 | 0 |
| 2022-01-07 | i1 | l1 | 50 | 0 |
| 2022-01-08 | i1 | l1 | 70 | 0 |
| 2022-01-09 | i1 | l1 | 100 | 1 |
| 2022-01-09 | i1 | l1 | 10 | 1 |
| 2022-01-10 | i1 | l1 | 200 | 0 |

Expected outcome for rolling average:

dateid1id2factbooleanrolling_avg
2022-01-01i1l11000
2022-01-02i1l1101100
2022-01-03i1l11100
2022-01-04i1l1700
2022-01-05i1l120190
2022-01-05i1l130190
2022-01-06i1l1400
2022-01-07i1l1500
2022-01-08i1l1700
2022-01-09i1l1100153.33
2022-01-09i1l110153.33
2022-01-10i1l12000

Rolling avg of 'fact' column for preceding 3 days required only for rows where boolean = 1 and while calculating the average consider only those values where boolean is 0.

Example1: for row 5, preceding 3 days are 4, 3 and 2 where boolean is 0 for 4 and 3. Hence (110+70/2) = 90

Example2: for row 9, preceding 3 days are 8, 7 and 6 where all have boolean = 0. Hence (70+50+40/3) = 53.33

I've tried doing

AVG(expression) OVER (PARTITION BY id1, id2 
                      ORDER BY date DESC 
                      ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) as rolling_avg

However the result I got didn't match the required outcome.

fgw7neuy

fgw7neuy1#

If you dates may have gaps, then I would recommend a lateral join:

select t.*, a.*
from mytable t
outer apply (
    select avg(t1.fact) rolling_avg
    from mytable t1
    where t.boolean = 1
        and t1.id1 = t.id1 and t1.id2 = t.id2 
        and t1.date >= dateadd(day, -3, t.date) and t1.date < t.date
        and t1.boolean = 0
) a

The subquery selects rows that belong to the same ids over the 3 previous days and computes the average fact, with respect to the boolean flag of the current and previous rows.

If there is always one and only one fact row per date for a given tuple of ids, then window functions with a row frame are relevant (and more efficient):

select t.*, 
    case when boolean = 1 then
        avg(case when boolean = 0 then fact end) over(
            partition by id1, id2 
            order by date
            rows between 3 preceding and 1 preceding
        ) 
    end rolling_avg
from mytable t

相关问题