样本数据
touristid|dayABC|1ABC|1ABC|2ABC|4ABC|5ABC|6ABC|8ABC|10
touristid|day
ABC|1
ABC|2
ABC|4
ABC|5
ABC|6
ABC|8
ABC|10
输出应该是
touristid|tripABC|4
touristid|trip
4后面的逻辑是连续天数不同连续天数的计数sqq 1,1,2是第一个,然后是4,5,6是第二个,然后是8是第三个,10是第四个我想用impala查询输出这个结果
n6lpvg4x1#
使用lag()函数获取前一天,如果前一天>1,则计算新的\u trip \u标志,然后计数(新的\u trip \u标志)。演示:
with table1 as (select 'ABC' as touristid, 1 as day union allselect 'ABC' as touristid, 1 as day union allselect 'ABC' as touristid, 2 as day union allselect 'ABC' as touristid, 4 as day union allselect 'ABC' as touristid, 5 as day union allselect 'ABC' as touristid, 6 as day union allselect 'ABC' as touristid, 8 as day union allselect 'ABC' as touristid, 10 as day )select touristid, count(new_trip_flag) trip_cnt from ( -- calculate new_trip_flag select touristid, case when (day-prev_day) > 1 or prev_day is NULL then true end new_trip_flag from ( -- get prev_day select touristid, day, lag(day) over(partition by touristid order by day) prev_day from table1 )s )s group by touristid;
with table1 as (
select 'ABC' as touristid, 1 as day union all
select 'ABC' as touristid, 2 as day union all
select 'ABC' as touristid, 4 as day union all
select 'ABC' as touristid, 5 as day union all
select 'ABC' as touristid, 6 as day union all
select 'ABC' as touristid, 8 as day union all
select 'ABC' as touristid, 10 as day
)
select touristid, count(new_trip_flag) trip_cnt
from
( -- calculate new_trip_flag
select touristid,
case when (day-prev_day) > 1 or prev_day is NULL then true end new_trip_flag
( -- get prev_day
select touristid, day,
lag(day) over(partition by touristid order by day) prev_day
from table1
)s
group by touristid;
结果:
touristid trip_cnt ABC 4
touristid trip_cnt
ABC 4
同样的方法也适用于Hive。
1条答案
按热度按时间n6lpvg4x1#
使用lag()函数获取前一天,如果前一天>1,则计算新的\u trip \u标志,然后计数(新的\u trip \u标志)。
演示:
结果:
同样的方法也适用于Hive。