使用impala获取连续行程的计数

dy2hfwbg  于 2021-06-27  发布在  Hive
关注(0)|答案(1)|浏览(522)

样本数据

  1. touristid|day
  2. ABC|1
  3. ABC|1
  4. ABC|2
  5. ABC|4
  6. ABC|5
  7. ABC|6
  8. ABC|8
  9. ABC|10

输出应该是

  1. touristid|trip
  2. ABC|4

4后面的逻辑是连续天数不同连续天数的计数sqq 1,1,2是第一个,然后是4,5,6是第二个,然后是8是第三个,10是第四个我想用impala查询输出这个结果

n6lpvg4x

n6lpvg4x1#

使用lag()函数获取前一天,如果前一天>1,则计算新的\u trip \u标志,然后计数(新的\u trip \u标志)。
演示:

  1. with table1 as (
  2. select 'ABC' as touristid, 1 as day union all
  3. select 'ABC' as touristid, 1 as day union all
  4. select 'ABC' as touristid, 2 as day union all
  5. select 'ABC' as touristid, 4 as day union all
  6. select 'ABC' as touristid, 5 as day union all
  7. select 'ABC' as touristid, 6 as day union all
  8. select 'ABC' as touristid, 8 as day union all
  9. select 'ABC' as touristid, 10 as day
  10. )
  11. select touristid, count(new_trip_flag) trip_cnt
  12. from
  13. ( -- calculate new_trip_flag
  14. select touristid,
  15. case when (day-prev_day) > 1 or prev_day is NULL then true end new_trip_flag
  16. from
  17. ( -- get prev_day
  18. select touristid, day,
  19. lag(day) over(partition by touristid order by day) prev_day
  20. from table1
  21. )s
  22. )s
  23. group by touristid;

结果:

  1. touristid trip_cnt
  2. ABC 4

同样的方法也适用于Hive。

展开查看全部

相关问题