sql/hive:如何计算购买天数

bf1o4zei  于 2021-05-29  发布在  Hadoop
关注(0)|答案(1)|浏览(472)

sql/hive:我想计算访客购买的天数。下面是我的数据

date    visitor orders
1-Jan   A   0  
1-Jan   B   0  
4-Jan   B   1  
5-Jan   A   0  
12-Jan  A   1

这是我期待的结果:

Days to purchase    count of visitors
0   0
1   0 
2   0
3   1
4   0
5   0
.   .
.   .
.   .
11  1

有什么帮助吗?

qoefvg9y

qoefvg9y1#

如果我理解正确的话:你需要做的是找到访客+订单的每个组合的最小日期

select visitor,orders,min(date) as min.date from table group by visitor,orders

这应该是这样的:

visitor orders min.date
  A         0  1-Jan 
  B         0  1-Jan
  B         1  4-Jan
  A         1  12-Jan

这个表(我们称之为tbl)可以自连接以提供

select A.visitor,datediff(day,purchase.date,first.visit) as days.to.purchase 
from (select visitor,min.date as first.visit from tbl where orders=0) A 
inner join (select visitor,min.date as purchase.date from tbl where orders=1) B
on A.visitor=B.visitor

现在,用一个外部查询 Package 此查询,以统计具有相同日期差异的访问者:

select days.to.purchase,count(visitors) as visitors from 
 (select A.visitor,datediff(day,purchase.date,first.visit) as days.to.purchase 
    from (select visitor,min.date as first.visit from tbl where orders=0) A 
    inner join (select visitor,min.date as purchase.date from tbl where orders=1) B
    on A.visitor=B.visitor
) joined
group by days.to.purchase order by days.to.purchase

希望我没听错。我不确定这是不是正确的解决方案,但你没有给我太多的开始:)
完整的解决方案可以是:

select days.to.purchase,count(visitors) as visitors from 
 (select A.visitor,datediff(day,purchase.date,first.visit) as days.to.purchase 
    from 
(select visitor,min.date as first.visit from 
(select visitor,orders,min(date) as min.date from table group by visitor,orders) tbl where orders=0) A 
    inner join 
(select visitor,min.date as purchase.date from 
(select visitor,orders,min(date) as min.date from table group by visitor,orders) tbl where orders=1) B
    on A.visitor=B.visitor
) joined
group by days.to.purchase order by days.to.purchase

相关问题