Below is the given table.
| date | id1 | id2 | fact | boolean |
| ------------ | ------------ | ------------ | ------------ | ------------ |
| 2022-01-01 | i1 | l1 | 100 | 0 |
| 2022-01-02 | i1 | l1 | 10 | 1 |
| 2022-01-03 | i1 | l1 | 110 | 0 |
| 2022-01-04 | i1 | l1 | 70 | 0 |
| 2022-01-05 | i1 | l1 | 20 | 1 |
| 2022-01-05 | i1 | l1 | 30 | 1 |
| 2022-01-06 | i1 | l1 | 40 | 0 |
| 2022-01-07 | i1 | l1 | 50 | 0 |
| 2022-01-08 | i1 | l1 | 70 | 0 |
| 2022-01-09 | i1 | l1 | 100 | 1 |
| 2022-01-09 | i1 | l1 | 10 | 1 |
| 2022-01-10 | i1 | l1 | 200 | 0 |
Expected outcome for rolling average:
date | id1 | id2 | fact | boolean | rolling_avg |
---|---|---|---|---|---|
2022-01-01 | i1 | l1 | 100 | 0 | |
2022-01-02 | i1 | l1 | 10 | 1 | 100 |
2022-01-03 | i1 | l1 | 110 | 0 | |
2022-01-04 | i1 | l1 | 70 | 0 | |
2022-01-05 | i1 | l1 | 20 | 1 | 90 |
2022-01-05 | i1 | l1 | 30 | 1 | 90 |
2022-01-06 | i1 | l1 | 40 | 0 | |
2022-01-07 | i1 | l1 | 50 | 0 | |
2022-01-08 | i1 | l1 | 70 | 0 | |
2022-01-09 | i1 | l1 | 100 | 1 | 53.33 |
2022-01-09 | i1 | l1 | 10 | 1 | 53.33 |
2022-01-10 | i1 | l1 | 200 | 0 |
Rolling avg of 'fact' column for preceding 3 days required only for rows where boolean = 1 and while calculating the average consider only those values where boolean is 0.
Example1: for row 5, preceding 3 days are 4, 3 and 2 where boolean is 0 for 4 and 3. Hence (110+70/2) = 90
Example2: for row 9, preceding 3 days are 8, 7 and 6 where all have boolean = 0. Hence (70+50+40/3) = 53.33
I've tried doing
AVG(expression) OVER (PARTITION BY id1, id2
ORDER BY date DESC
ROWS BETWEEN 3 PRECEDING AND CURRENT ROW) as rolling_avg
However the result I got didn't match the required outcome.
1条答案
按热度按时间fgw7neuy1#
If you dates may have gaps, then I would recommend a lateral join:
The subquery selects rows that belong to the same ids over the 3 previous days and computes the average fact, with respect to the boolean flag of the current and previous rows.
If there is always one and only one fact row per date for a given tuple of ids, then window functions with a row frame are relevant (and more efficient):