hive-根据收集时间提取第一个和最后一个记录的数据

n8ghc7c1  于 2021-06-02  发布在  Hadoop
关注(0)|答案(1)|浏览(357)

我有一个配置单元表,在该表中,我们每天都可以获得数据,并且具有收集时间(unix时间戳)

+--------+-----------------+------+----------+
| ticket | collection_time | type |   day    |
+--------+-----------------+------+----------+
| t1     |             123 | auto | 20170302 |
| t2     |             234 | req  | 20170302 |
| t3     |             345 | req  | 20170302 |
| t4     |             678 | auto | 20170303 |
| t5     |             111 | req  | 20170301 |
| t6     |             222 | auto | 20170301 |
| t7     |             333 | auto | 20170301 |
| t8     |             444 | req  | 20170301 |
+--------+-----------------+------+----------+

我想找出一天的总票数,以及在一天的第一次收集时间和最后一次收集时间发生的票数(收集时间和类型)。请建议使用配置单元查询解决此问题的任何方法。

+---------------+---------------------+----------------+--------------------+---------------+----------+
| count(ticket) | first_tkt_coll_time | first_tkt_type | last_tkt_coll_time | last_tkt_type |   day    |
+---------------+---------------------+----------------+--------------------+---------------+----------+
|             3 |                 123 | auto           |                345 | req20170302   |          |
|             1 |                 678 | auto           |                678 | auto          | 20170303 |
|             4 |                 111 | req            |                444 | req           | 20170301 |
+---------------+---------------------+----------------+--------------------+---------------+----------+

我们还需要在同一输出集中输入第二天的计数(票证)。

+---------+-------------+-----------------+------+-----------------+------+---‌​--+ 
| cnt_day | cnt_nxt_day | collection_time | type | collection_time | type | day | 
+---------+-------------+-----------------+------+-----------------+------+---‌​--+ 
| 4 | 3| 111 | req | 444 | req | 20170301 |
| 3 | 1| 123 | auto | 345 | req | 20170302 | 
| 1 | - | 678 | auto | 678 | auto | 20170303 | 
+---------+-------------+-----------------+------+-----------------+------+---‌​--+

提前感谢:)

bjp0bcyl

bjp0bcyl1#

select  cnt
       ,cnt_next_day 
       ,min_val.collection_time
       ,min_val.type
       ,max_val.collection_time
       ,max_val.type
       ,day

from   (select    count(*)                                                         as cnt
                 ,lead (count(*)) over (order by day)                              as cnt_next_day
                 ,min(named_struct('collection_time',collection_time,'type',type)) as min_val
                 ,max(named_struct('collection_time',collection_time,'type',type)) as max_val
                 ,day

        from      mytable

        group by  day
        ) t
+-----+--------------+-----------------+------+-----------------+------+----------+
| cnt | cnt_next_day | collection_time | type | collection_time | type |   day    |
+-----+--------------+-----------------+------+-----------------+------+----------+
|   4 | 3            |             111 | req  |             444 | req  | 20170301 |
|   3 | 1            |             123 | auto |             345 | req  | 20170302 |
|   1 | NULL         |             678 | auto |             678 | auto | 20170303 |
+-----+--------------+-----------------+------+-----------------+------+----------+

相关问题