sql—仅当前一行中的一个值小于另一个值时才聚合前一行

qzwqbdag  于 2021-07-26  发布在  Java
关注(0)|答案(3)|浏览(434)

我的处境是每台机器都有两次计数, p 以及 r . p 应始终大于或等于 r 但由于技术滞后和聚合周期短,情况并非总是如此,因为 r 经常计数-但不总是-显示前一时期的数据。因为滞后的长度不是恒定的,所以无法确切地知道是哪个周期 r 值属于。因此,我不能简单地移动所有的 r 在时间上统一向后计数,因为这可能会在以前没有的地方产生其他差异。
这种情况无法改变,我必须按原样处理数据。
在下面的示例中,您可以看到 p 在机器上计算短暂的“暂停” 1 在机器上速度明显减慢 2 ,但是 r count继续返回大于的值 p 在“暂停”之前的短暂时间:

  1. -- Dummy data
  2. declare @t table(d date,m int,p int,r int);
  3. insert into @t values(getdate()-9,1,100,10),(getdate()-8,1,90 ,10),(getdate()-7,1,70 ,10),(getdate()-6,1,70 ,10),(getdate()-5,1,80 ,10),(getdate()-4,1,50 ,10),(getdate()-3,1,10 ,10),(getdate()-2,1,0 ,10),(getdate()-1,1,0 ,10),(getdate()+0,1,0 ,10),(getdate()+1,1,0 ,0),(getdate()+2,1,0 ,0),(getdate()+3,1,40 ,0),(getdate()+4,1,50 ,0),(getdate()+5,1,80 ,10),(getdate()-9,2,1100,100),(getdate()-8,2,190 ,100),(getdate()-7,2,170 ,100),(getdate()-6,2,170 ,100),(getdate()-5,2,180 ,100),(getdate()-4,2,150 ,100),(getdate()-3,2,110 ,100),(getdate()-2,2,10 ,100),(getdate()-1,2,10 ,100),(getdate()+0,2,10 ,100),(getdate()+1,2,10 ,0),(getdate()+2,2,10 ,0),(getdate()+3,2,140 ,0),(getdate()+4,2,150 ,0),(getdate()+5,2,180 ,100);
  4. select * from @t order by m,d;
  5. -- Output
  6. +------------+---+------+-----+
  7. | d | m | p | r |
  8. +------------+---+------+-----+
  9. | 2020-05-27 | 1 | 100 | 10 |
  10. | 2020-05-28 | 1 | 90 | 10 |
  11. | 2020-05-29 | 1 | 70 | 10 |
  12. | 2020-05-30 | 1 | 70 | 10 |
  13. | 2020-05-31 | 1 | 80 | 10 |
  14. | 2020-06-01 | 1 | 50 | 10 |
  15. | 2020-06-02 | 1 | 10 | 10 |
  16. | 2020-06-03 | 1 | 0 | 10 |
  17. | 2020-06-04 | 1 | 0 | 10 |
  18. | 2020-06-05 | 1 | 0 | 10 |
  19. | 2020-06-06 | 1 | 0 | 0 |
  20. | 2020-06-07 | 1 | 0 | 0 |
  21. | 2020-06-08 | 1 | 40 | 0 |
  22. | 2020-06-09 | 1 | 50 | 0 |
  23. | 2020-06-10 | 1 | 80 | 10 |
  24. | 2020-05-27 | 2 | 1100 | 100 |
  25. | 2020-05-28 | 2 | 190 | 100 |
  26. | 2020-05-29 | 2 | 170 | 100 |
  27. | 2020-05-30 | 2 | 170 | 100 |
  28. | 2020-05-31 | 2 | 180 | 100 |
  29. | 2020-06-01 | 2 | 150 | 100 |
  30. | 2020-06-02 | 2 | 110 | 100 |
  31. | 2020-06-03 | 2 | 10 | 100 |
  32. | 2020-06-04 | 2 | 10 | 100 |
  33. | 2020-06-05 | 2 | 10 | 100 |
  34. | 2020-06-06 | 2 | 10 | 0 |
  35. | 2020-06-07 | 2 | 10 | 0 |
  36. | 2020-06-08 | 2 | 140 | 0 |
  37. | 2020-06-09 | 2 | 150 | 0 |
  38. | 2020-06-10 | 2 | 180 | 100 |
  39. +------------+---+------+-----+

我需要能够明智地调整这些 r 在某种程度上倒计时,这样就可以将它们添加到前面的行中,使每一行保持一致 p 大于或等于相应 r 价值观。
在上面的例子中 m = 1 ,输出可能类似于以下任何一种 r 计数;我不在乎调整的蔓延,只在乎这个 p >= r 对于每一行和每一次调整,只能在时间上向后移动:

  1. +------------+---+------+------+------+------+
  2. | d | m | p | r1 | r2 | r3 |
  3. +------------+---+------+------+------+------+
  4. | 2020-05-27 | 1 | 100 | 10 | 10 | 10 |
  5. | 2020-05-28 | 1 | 90 | 10 | 10 | 10 |
  6. | 2020-05-29 | 1 | 70 | 10 | 15 | 10 |
  7. | 2020-05-30 | 1 | 70 | 20 | 20 | 10 |) Note how the original 30 r counts
  8. | 2020-05-31 | 1 | 80 | 20 | 20 | 10 |} that didn't follow the rule
  9. | 2020-06-01 | 1 | 50 | 20 | 15 | 40 |) have been moved back in time
  10. | 2020-06-02 | 1 | 10 | 10 | 10 | 10 |
  11. | 2020-06-03 | 1 | 0 | 0 | 0 | 0 |
  12. | 2020-06-04 | 1 | 0 | 0 | 0 | 0 |
  13. | 2020-06-05 | 1 | 0 | 0 | 0 | 0 |
  14. | 2020-06-06 | 1 | 0 | 0 | 0 | 0 |
  15. | 2020-06-07 | 1 | 0 | 0 | 0 | 0 |
  16. | 2020-06-08 | 1 | 40 | 0 | 0 | 0 |
  17. | 2020-06-09 | 1 | 50 | 0 | 0 | 0 |
  18. | 2020-06-10 | 1 | 80 | 10 | 10 | 10 |
  19. +------------+---+------+------+------+------+

我试过用窗口函数和 rows between 但我不知道如何识别 r 需要重新分配给以前期间的值,以及确定 p 分配给的值。如果我有任何进展,我会加在下面,但所有的帮助是非常感谢。

尝试1

我所管理的最接近的是下面的代码,它对上面的代码有效,但是当您更改 p = 50 价值低于 40 当我只想向后时,也会在时间上前后调整:

  1. with t as(
  2. select row_number() over (partition by m order by d) as rn
  3. ,(row_number() over (partition by m order by d)-1) / 5 as gn
  4. ,*
  5. from @t
  6. where m = 1
  7. )
  8. select *
  9. ,case when p > r
  10. then r + (sum(case when p < r then r else 0 end) over (partition by gn) / sum(case when p > r then 1 else 0 end) over (partition by gn))
  11. else case when p = r
  12. then r
  13. else 0
  14. end
  15. end as r_adj
  16. from t;

尝试2

这更近了,但仍在时间上向前和向后调整:

  1. with t as(
  2. select row_number() over (partition by m order by d) as rn
  3. ,(row_number() over (partition by m order by d)-1) / 10 as gn
  4. ,(row_number() over (partition by m order by d)+4) / 10 as gn2
  5. ,*
  6. from @t
  7. where m = 1
  8. )
  9. ,r1 as(
  10. select *
  11. ,case when p > r
  12. then r + (sum(case when p < r then r - p else 0 end) over (partition by gn) / sum(case when p > r then 1. else 0. end) over (partition by gn))
  13. else case when p = r
  14. then r
  15. else 0
  16. end
  17. end as r_adj
  18. from t
  19. )
  20. select d
  21. ,m
  22. ,p
  23. ,r
  24. ,case when p > r_adj
  25. then r_adj + (sum(case when p < r_adj then r_adj - p else 0 end) over (partition by gn2) / sum(case when p > r_adj then 1. else 0. end) over (partition by gn2))
  26. else case when p = r_adj
  27. then r_adj
  28. else r_adj - (r_adj - p)
  29. end
  30. end as r_new
  31. from r1
  32. order by rn
  33. ;
zzoitvuj

zzoitvuj1#

一种方法使用 apply :

  1. select t.*,
  2. t2.r as imputed_r
  3. from t outer apply
  4. (select top (1) t2.*
  5. from t t2
  6. where t2.m = t.m and
  7. t2.d >= t.d and t2.r <= t.p
  8. order by t2.d desc
  9. ) t2;

这是一把小提琴。

db2dz4w8

db2dz4w82#

你也许可以把它整理一下,但希望详细的内容能帮助你理解正在发生的事情。

  1. WITH r1 AS (
  2. -- Find what the previous day's p value was so we can compare it with today's
  3. SELECT
  4. *
  5. ,LAG(p) OVER (PARTITION BY m ORDER BY d) AS previous_p
  6. FROM
  7. @t
  8. )
  9. ,r2 AS (
  10. -- Compute the change between p valuesin percentage
  11. SELECT
  12. *
  13. ,CASE WHEN r1.previous_p > 0 THEN ((r1.p*1.0-r1.previous_p)/r1.previous_p) ELSE 0 END AS p_delta
  14. FROM
  15. r1
  16. )
  17. ,r3 AS (
  18. -- Use the percentage change to check if a machine was "paused"
  19. SELECT
  20. *
  21. ,CASE WHEN r2.p_delta > 0.8 OR (r2.previous_p = 0 AND r2.p > 0) THEN 1 ELSE 0 END AS is_new_session -- Adjust 0.8 to whatever percent change makes sense for a new session
  22. FROM
  23. r2
  24. )
  25. , r4 AS (
  26. -- Put each row for a machine into its corresponding group
  27. SELECT
  28. *
  29. ,SUM(r3.is_new_session) OVER (PARTITION BY r3.m ORDER BY r3.d) AS session_group_id
  30. FROM
  31. r3
  32. )
  33. , r5 AS (
  34. -- Now we can calculate for each group how total p and r leftover are for each day
  35. SELECT
  36. *
  37. ,CASE WHEN r4.p-r4.r > 0 AND r4.r > 0 THEN r4.p-r4.r ELSE 0 END AS remaining_p
  38. ,CASE WHEN r4.r-r4.p > 0 THEN r4.r-r4.p ELSE 0 END AS remaining_r
  39. FROM
  40. r4
  41. )
  42. , r6 AS (
  43. -- We need to get the accumulating remaining amounts
  44. SELECT
  45. *
  46. ,SUM(r5.remaining_p) OVER (PARTITION BY r5.m, r5.session_group_id ORDER BY r5.d ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS total_remaining_p
  47. ,SUM(r5.remaining_r) OVER (PARTITION BY r5.m, r5.session_group_id ORDER BY r5.d ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) AS total_remaining_r
  48. FROM
  49. r5
  50. )
  51. ,r7 AS (
  52. -- We then go backwards and subtract accumulating p out of the accumulated r
  53. SELECT
  54. *
  55. ,r6.total_remaining_r - r6.total_remaining_p AS adjusted_total_remaining_r
  56. FROM
  57. r6
  58. )
  59. ,r8 AS (
  60. -- Shift everything back one day so the adjusted remaining r can be added to the correct date
  61. SELECT
  62. *
  63. ,LEAD(r7.adjusted_total_remaining_r) OVER (PARTITION BY r7.m, r7.session_group_id ORDER BY r7.d) AS next_adjusted_total_remaining_r
  64. FROM
  65. r7
  66. )
  67. -- Finally, calculate what the corected r amount is for each day
  68. SELECT
  69. r8.d
  70. ,r8.m
  71. ,r8.p
  72. ,r8.r
  73. ,CASE WHEN r8.remaining_p > 0 THEN -- If there is any remaing p left where we can put r amounts
  74. CASE WHEN r8.total_remaining_p > r8.total_remaining_r THEN -- Also check that we have more accumlated left than r
  75. CASE WHEN r8.next_adjusted_total_remaining_r > 0 THEN r8.r + r8.next_adjusted_total_remaining_r -- If we still have space, add the adjusted amount of what is left
  76. ELSE r8.r -- If we have run out of extra r, leave it as r
  77. END
  78. ELSE r8.p -- For those days that we had to adjust by the max amount possible, r + adjusted_r will just be p
  79. END
  80. ELSE r8.r - r8.remaining_r END AS corrected_r -- Remove the extra r where r was greater than p
  81. FROM
  82. r8
  83. ORDER BY r8.m,r8.d

输出:

  1. +------------+---+------+-----+-------------+
  2. | d | m | p | r | corrected_r |
  3. +------------+---+------+-----+-------------+
  4. | 2020-05-27 | 1 | 100 | 10 | 10 |
  5. | 2020-05-28 | 1 | 90 | 10 | 10 |
  6. | 2020-05-29 | 1 | 70 | 10 | 10 |
  7. | 2020-05-30 | 1 | 70 | 10 | 10 |
  8. | 2020-05-31 | 1 | 80 | 10 | 10 |
  9. | 2020-06-01 | 1 | 50 | 10 | 40 |
  10. | 2020-06-02 | 1 | 10 | 10 | 10 |
  11. | 2020-06-03 | 1 | 0 | 10 | 0 |
  12. | 2020-06-04 | 1 | 0 | 10 | 0 |
  13. | 2020-06-05 | 1 | 0 | 10 | 0 |
  14. | 2020-06-06 | 1 | 0 | 0 | 0 |
  15. | 2020-06-07 | 1 | 0 | 0 | 0 |
  16. | 2020-06-08 | 1 | 40 | 0 | 0 |
  17. | 2020-06-09 | 1 | 50 | 0 | 0 |
  18. | 2020-06-10 | 1 | 80 | 10 | 10 |
  19. | 2020-05-27 | 2 | 1100 | 100 | 100 |
  20. | 2020-05-28 | 2 | 190 | 100 | 100 |
  21. | 2020-05-29 | 2 | 170 | 100 | 160 |
  22. | 2020-05-30 | 2 | 170 | 100 | 170 |
  23. | 2020-05-31 | 2 | 180 | 100 | 180 |
  24. | 2020-06-01 | 2 | 150 | 100 | 150 |
  25. | 2020-06-02 | 2 | 110 | 100 | 110 |
  26. | 2020-06-03 | 2 | 10 | 100 | 10 |
  27. | 2020-06-04 | 2 | 10 | 100 | 10 |
  28. | 2020-06-05 | 2 | 10 | 100 | 10 |
  29. | 2020-06-06 | 2 | 10 | 0 | 0 |
  30. | 2020-06-07 | 2 | 10 | 0 | 0 |
  31. | 2020-06-08 | 2 | 140 | 0 | 0 |
  32. | 2020-06-09 | 2 | 150 | 0 | 0 |
  33. | 2020-06-10 | 2 | 180 | 100 | 100 |
  34. +------------+---+------+-----+-------------+
展开查看全部
uqcuzwp8

uqcuzwp83#

通过使用下面的脚本,我们成功地使它工作在一个可以接受的非常小的误差范围内。
它的工作原理是检查前面的行,看是否有多余的 r 如果是,则将其添加到当前行。然后对这个的输出运行相同的过程,有效地将任何多余的内容向后移动两行。这涵盖了现实世界数据中的大多数情况。
最后,它会检查是否有多余的内容,并找到最近一行的时间戳 p 超过某个阈值的值,将多余的值移到。如果没有带的行 p > r 事先(例如,接收到的第一个数据周期会将此问题排除在外),它会在将来获取最近的一个数据周期。

  1. with t as(
  2. select *
  3. ,case when lead(p - r) over (partition by m order by d) < 0
  4. then case when p - r < 0
  5. then p
  6. else r
  7. end
  8. + abs(lead(p - r,1,0) over (partition by m order by d))
  9. else case when p - r >= 0
  10. then r
  11. else r + (p - r)
  12. end
  13. end
  14. as r1
  15. from @t
  16. )
  17. ,b2 as(
  18. select *
  19. ,case when lead(p - r1) over (partition by m order by d) < 0
  20. then case when p - r1 < 0
  21. then p
  22. else r1
  23. end
  24. + abs(lead(p - r1,1,0) over (partition by m order by d))
  25. else case when p - r1 >= 0
  26. then r1
  27. else r1 + (p - r1)
  28. end
  29. end
  30. as r2
  31. from t
  32. )
  33. ,d as(
  34. select *
  35. ,case when p - r2 < 0
  36. then case when max(case when p - r2 > 30 then d else '19000101' end) over (partition by m order by d rows between unbounded preceding and 1 preceding) = '19000101'
  37. then min(case when p - r2 > 30 then d else '29990101' end) over (partition by m order by d rows between 1 following and unbounded following)
  38. else max(case when p - r2 > 30 then d else '19000101' end) over (partition by m order by d rows between unbounded preceding and 1 preceding)
  39. end
  40. else null
  41. end as drep
  42. from b2
  43. )
  44. ,dover as
  45. (
  46. select drep
  47. ,m
  48. ,sum(p) as p
  49. ,sum(r2 - p) as r3
  50. from d
  51. where drep is not null
  52. group by drep
  53. ,m
  54. )
  55. ,f as(
  56. select d.*
  57. ,dover.r3
  58. ,case when d.drep is null
  59. then d.r2 + isnull(dover.r3 /*- dover.p*/,0) -- abs(isnull(dover.p - (dover.r3 - dover.p),0))
  60. else d.r2 - (d.r2 - d.p)
  61. end as ra
  62. from d
  63. left join dover
  64. on d.m = dover.m
  65. and d.d = dover.drep
  66. )
  67. select d,m,p,r,ra
  68. from f
  69. order by m,d
展开查看全部

相关问题