用于查找最大值的配置单元查询

ctrmrzij  于 2021-05-29  发布在  Hadoop
关注(0)|答案(2)|浏览(354)

我有下面的数据,我想获取每个id的最新分区时间

ID       time
12  10038446  201705102100
13  10038446  201706052100
14  10038446  201706060000
15  10038446  201706060100
16  10103517  201705101700
17  10103517  201705102100
18  10103517  201706052100
19  10103517  201706060100
20  10124464  201701310100
21  10124464  201702210500
22  10124464  201702220500
23  10124464  201703062100
24  10124464  201705102100
25  10124464  201706052100
26  10124464  201706060100

输出我期望如下

15  10038446  201706060100
19  10103517  201706060100
26  10124464  201706060100
37  1019933 201706052100

如何使用配置单元查询实现这一点?

eeq64g8w

eeq64g8w1#

使用简单聚合:

select  id, max(time) as time
  from table
group by id
order by id; --order if necessary

使用数据集演示:

select id, max(time) as time 
from
table
group by id

OK
10038446        201706060100
10103517        201706060100
10124464        201706060100
Time taken: 30.66 seconds, Fetched: 3 row(s)
xriantvc

xriantvc2#

试试这个

select ID, time
from
(
  select 
    ID, 
    time, 
    row_number() over (partition by ID order by time desc) as time_rank
  from table_name
 ) x
where time_rank = 1
group by ID, time

没有子查询(低配置单元版本),临时表是一个选项。

create table tmp_table as
select 
  ID, 
  time, 
  row_number() over (partition by ID order by time desc) as time_rank
from table_name;

select ID, time
from tmp_table
where time_rank = 1
group by ID, time;

drop table tmp_table;

相关问题