选择满足配置单元中条件的上一行

nwsw7zdq  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(487)

我有这样的产品数据

  1. Product Date Sales Availbility
  2. xyz 2017-12-31 724.5 6.0
  3. xyz 2018-01-07 362.25 7.0
  4. xyz 2018-01-14 281.75 7.0
  5. xyz 2018-01-21 442.75 7.0
  6. xyz 2018-01-28 442.75 6.0
  7. xyz 2018-02-04 402.5 7.0
  8. xyz 2018-02-11 201.25 3.0
  9. xyz 2018-02-18 120.75 0.0
  10. xyz 2018-02-25 40.25 0.0
  11. xyz 2018-03-11 201.25 0.0
  12. xyz 2018-03-18 483.0 5.0
  13. xyz 2018-03-25 322.0 7.0
  14. xyz 2018-04-01 241.5 7.0
  15. xyz 2018-04-08 281.75 7.0
  16. xyz 2018-04-15 523.25 7.0
  17. xyz 2018-04-22 241.5 7.0
  18. xyz 2018-04-29 362.25 7.0

数据不是订购的(一个小问题),我想做的是,只要我们在可用性列(第4列)中有0,我就要取前3周(完全可用,即7)的平均值
如下所示:

  1. xyz 2017-12-31 724.5 6.0 Null
  2. xyz 2018-01-07 362.25 7.0 362.25 ( Same value for weeks with availbility = 7)
  3. xyz 2018-01-14 281.75 7.0 281.75
  4. xyz 2018-01-21 442.75 7.0 442.75
  5. xyz 2018-01-28 442.75 6.0 361 (362 + 281 + 362/3)the prior fully availble week avg which is avilble)
  6. xyz 2018-02-04 402.5 7.0 402
  7. xyz 2018-02-11 201.25 3.0 375 (402 + 442 + 281 /3)
  8. xyz 2018-02-18 120.75 0.0 375 ( Same since 375 is the most recent 4 fully availble average)
  9. xyz 2018-02-25 40.25 0.0 375
  10. xyz 2018-03-11 201.25 0.0 375
  11. xyz 2018-03-18 483.0 5.0 375
  12. xyz 2018-03-25 322.0 7.0 322
  13. xyz 2018-04-01 241.5 7.0 241
  14. xyz 2018-04-08 281.75 7.0 281
  15. xyz 2018-04-15 523.25 7.0 523
  16. xyz 2018-04-22 241.5 7.0 241
  17. xyz 2018-04-29 362.25 7.0 362

我试着找到3周的平均值,把它和剩下的几周结合起来,然后用lag函数来检索最近的平均值。

  1. select a.*,lag(case when a.Full_availble_sales >0 then a.Full_availble_sales end,1) over (partition by a.asin order by a.week_beginning) as Four_wk_avg from (select asin,week_beginning,avg(sales) as weekly_sales,sum(available_to_purchase) as weekly_availbility,0 as Full_availble_sales from t1 where asin = 'xyz' group by asin,week_beginning having sum(available_to_purchase) < 7
  2. union all
  3. select t.asin,t.week_beginning,t.weekly_sales,t.weekly_availbility,avg(t.weekly_sales) over (partition by t.asin order by t.week_beginning rows between 3 preceding and current row ) as Full_availble_sales from
  4. (select asin,week_beginning,avg(sales) as weekly_sales,sum(available_to_purchase) as weekly_availbility from t1 where asin = 'xyz' group by asin,week_beginning having sum(available_to_purchase) = 7)t ) a order by a.week_beginning

o/p是

  1. xyz 2017-12-31 724.5 6.0 0.0 NULL
  2. xyz 2018-01-07 362.25 7.0 362.25 NULL
  3. xyz 2018-01-14 281.75 7.0 322.0 362.25
  4. xyz 2018-01-21 442.75 7.0 362.25 322.0
  5. xyz 2018-01-28 442.75 6.0 0.0 362.25
  6. xyz 2018-02-04 402.5 7.0 372.3125 NULL
  7. xyz 2018-02-11 201.25 3.0 0.0 372.3125
  8. xyz 2018-02-18 120.75 0.0 0.0 NULL
  9. xyz 2018-02-25 40.25 0.0 0.0 NULL
  10. xyz 2018-03-11 201.25 0.0 0.0 NULL
  11. xyz 2018-03-18 483.0 5.0 0.0 NULL
  12. xyz 2018-03-25 322.0 7.0 362.25 NULL
  13. xyz 2018-04-01 241.5 7.0 352.1875 362.25
  14. xyz 2018-04-08 281.75 7.0 311.9375 352.1875
  15. xyz 2018-04-15 523.25 7.0 342.125 311.9375
  16. xyz 2018-04-22 241.5 7.0 322.0 342.125
  17. xyz 2018-04-29 362.25 7.0 352.1875 322.0

这不是我的本意。

iaqfqrcu

iaqfqrcu1#

这将完成任务(使用聚合函数 avg 以及 max_by 在Windows上)

  1. WITH
  2. tt1 (Product,Date_week_beginning,Sales,Availbility) AS
  3. ( SELECT * FROM ( VALUES
  4. ('xyz','2017-12-31', 724.5 ,6.0),
  5. ('xyz','2018-01-07', 362.25 ,7.0),
  6. ('xyz','2018-01-14', 281.75 ,7.0),
  7. ('xyz','2018-01-21', 442.75 ,7.0),
  8. ('xyz','2018-01-28', 442.75 ,6.0),
  9. ('xyz','2018-02-04', 402.5 ,7.0),
  10. ('xyz','2018-02-11', 201.25 ,3.0),
  11. ('xyz','2018-02-18', 120.75 ,0.0),
  12. ('xyz','2018-02-25', 40.25 ,0.0),
  13. ('xyz','2018-03-11', 201.25 ,0.0),
  14. ('xyz','2018-03-18', 483.0 ,5.0),
  15. ('xyz','2018-03-25', 322.0 ,7.0),
  16. ('xyz','2018-04-01', 241.5 ,7.0),
  17. ('xyz','2018-04-08', 281.75 ,7.0),
  18. ('xyz','2018-04-15', 523.25 ,7.0),
  19. ('xyz','2018-04-22', 241.5 ,7.0),
  20. ('xyz','2018-04-29', 362.25 ,7.0) )
  21. ), tt2 AS (
  22. SELECT *, avg(sales) OVER (partition by Product order by if(Availbility = 7.0,1),Date_week_beginning ROWS BETWEEN 2 PRECEDING AND CURRENT ROW) avg3
  23. FROM tt1
  24. )
  25. SELECT Product,Date_week_beginning,Sales,Availbility,
  26. CASE WHEN Availbility = 7.0 THEN Sales
  27. ELSE
  28. max_by(if(Availbility = 7.0,avg3),(Availbility = 7.0, Date_week_beginning) ) OVER (partition by Product order by Date_week_beginning)
  29. END new_col
  30. FROM tt2
  31. ORDER BY Product,Date_week_beginning

结果和要求的完全一样:

  1. Product Date_week_beginning Sales Availbility new_col
  2. xyz 2017-12-31 724.5 6.0 NULL
  3. xyz 2018-01-07 362.25 7.0 362.25
  4. xyz 2018-01-14 281.75 7.0 281.75
  5. xyz 2018-01-21 442.75 7.0 442.75
  6. xyz 2018-01-28 442.75 6.0 362.25
  7. xyz 2018-02-04 402.5 7.0 402.5
  8. xyz 2018-02-11 201.25 3.0 375.6666666666667
  9. xyz 2018-02-18 120.75 0.0 375.6666666666667
  10. xyz 2018-02-25 40.25 0.0 375.6666666666667
  11. xyz 2018-03-11 201.25 0.0 375.6666666666667
  12. xyz 2018-03-18 483.0 5.0 375.6666666666667
  13. xyz 2018-03-25 322.0 7.0 322.0
  14. xyz 2018-04-01 241.5 7.0 241.5
  15. xyz 2018-04-08 281.75 7.0 281.75
  16. xyz 2018-04-15 523.25 7.0 523.25
  17. xyz 2018-04-22 241.5 7.0 241.5
  18. xyz 2018-04-29 362.25 7.0 362.25

我假设问题中有两个输入错误(可以从示例中得到验证):
问题中的这一行是错误的: 361 (362 + 281 + 362/3)the prior fully availble week avg which is avilble) 应该是 361 (442 + 281 + 362/3)the prior fully availble week avg which is avilble) 这句话:“我想做的是,在可用性列(第4列)中有0的地方”应该是“我想做的是,在可用性列(第4列)中没有7.0的地方”

展开查看全部

相关问题