根据时间差将行拆分为Hive中最接近的半小时

bvjveswy  于 2021-05-31  发布在  Hadoop
关注(0)|答案(1)|浏览(365)

我想在Hive里根据时差把行分成最接近半个小时
下面是我想要达到的目标的一个例子

User   Start_time            End_time              Duration
A      2020-05-05 06:45:00   2020-05-05 07:15:00   30
B      2020-05-05 10:15:00   2020-05-05 11:15:00   60

User   Start_time            End_time              Duration
A      2020-05-05 06:45:00   2020-05-05 07:00:00   15
A      2020-05-05 07:00:00   2020-05-05 07:15:00   15
B      2020-05-05 10:15:00   2020-05-05 10:30:00   15
B      2020-05-05 10:30:00   2020-05-05 11:00:00   30
B      2020-05-05 11:00:00   2020-05-05 11:15:00   15
xeufq47z

xeufq47z1#

根据需要计算间隔数 floor(duration/30) 然后使用 lateral view posexplode(split(space(s.num_intervals),' ')) 参见代码中的其他逻辑:

with mytable as (--test dataset
select stack(3,
'A', '2020-05-05 06:45:00', '2020-05-05 07:15:00', 30,
'B', '2020-05-05 10:15:00', '2020-05-05 11:15:00', 60,
'C', '2020-05-05 10:00:00', '2020-05-05 12:15:00', 135
) as (Usr, Start_time, End_time, Duration)
)

select Usr, start_time, end_time, (unix_timestamp(end_time)-unix_timestamp(start_time))/60 Duration
from
(
select  Usr,

        case when i.i=0 then start_time --first record
             when i.i=1 then from_unixtime(unix_timestamp(start_time)+shift*60) 
             else  from_unixtime(unix_timestamp(start_time)+shift*60+(i.i-1)*30*60)
         end start_time,

         case when i.i=num_intervals then end_time                             --end of range
              when i.i=0 then from_unixtime(unix_timestamp(start_time)+shift*60) --first record
              else from_unixtime(unix_timestamp(start_time)+shift*60+(i.i)*30*60)
          end end_time
from
    (--calculate required intervals and shift to 30 min
     SELECT Usr,Start_time,End_time,
            cast( floor(duration/30) as int) num_intervals,

             case when (minute(Start_time) between 1 and 29) then 30-minute(Start_time)
                  when minute(Start_time) > 30 then 60-minute(Start_time)
                  else 30
              end shift
       FROM mytable 
    )s
    lateral view posexplode(split(space(s.num_intervals),' ')) i as i,x
)s
;

结果:

usr start_time          end_time            duration
A   2020-05-05 06:45:00 2020-05-05 07:00:00 15
A   2020-05-05 07:00:00 2020-05-05 07:15:00 15
B   2020-05-05 10:15:00 2020-05-05 10:30:00 15
B   2020-05-05 10:30:00 2020-05-05 11:00:00 30
B   2020-05-05 11:00:00 2020-05-05 11:15:00 15
C   2020-05-05 10:00:00 2020-05-05 10:30:00 30
C   2020-05-05 10:30:00 2020-05-05 11:00:00 30
C   2020-05-05 11:00:00 2020-05-05 11:30:00 30
C   2020-05-05 11:30:00 2020-05-05 12:00:00 30
C   2020-05-05 12:00:00 2020-05-05 12:15:00 15

也许我没有测试所有可能的案例。请自己测试调试。对于您的数据示例,它工作得很好。

相关问题