1个值

6yoyoihd  于 2021-06-26  发布在  Hive
关注(0)|答案(1)|浏览(133)

我有一个具有以下架构的表:

Order_id    customer_Id purchaseDate    movie_Id    minutesStreamed
01  C1  1/1/2000    P1  100
02  C2  1/1/2002    P2  90
03  C3  4/1/2002    P3  93
04  C4  4/1/2003    P1  99
05  C4  1/1/2006    P2  99
06  C1  5/1/2006    P5  89
07  C4  12/1/2017   P5  89
08  C3  3/3/2018    P1  145
09  C4  3/3/2018    P6  147

我想找到那些每次看电影的时间越来越少的客户,即他们的第二流少于第一流,第三流少于第二流,以此类推。
我知道如何找到一个案例,即第三<2rd或第二<1st,但如何检查所有组合。

select a.*
from
(
select customer_id,purchase_date,minutes_streamed, lag(minutes_streamed,1) over (partition by customer_id order by purchase_date) prev_mins_streams
from orders
)a
inner join
(select customer_id,max(purchase_date) max_purchase_dt from orders group by customer_id) b
on a.customer_id=b.customer_id
and a.purchase_date=b.max_purchase_dt
where a.minutes_streamed<a.prev_mins_streams
;
wbgh16ku

wbgh16ku1#

如果您需要只拒绝的客户,请定义一个标志,然后聚合该单位:

select o.customer_id
from (select o.*,
             lag(minutes_streamed,1) over (partition by customer_id order by purchase_date) as prev_ms
      from orders o
     ) o
group by o.customer_id
having sum(case when prev_ms is null or prev_ms < minutes_streams then 0 else 1 end) = 0;

这个 having 子句基本上计算例外情况。这个 = 0 说没有。

相关问题