hiveql—配置单元中多列的最大值

wfsdck30  于 2021-06-27  发布在  Hive
关注(0)|答案(1)|浏览(624)

例如

ID   dt_col_1    dt_col_2    dt_col_3  
1    09-10-2018  08-10-2018  10-10-2018  
1    10-10-2018  null        11-10-2018  
1    11-10-2018  10-10-2018  12-10-2018  
2    null        08-10-2018  12-10-2018  
2    10-10-2018  13-10-2018  09-10-2018

寻找:

ID   dt_col_1    dt_col_2    dt_col_3  
1    null        null        12-10-2018  
2    null        13-10-2018  null

hive中有一个最大的函数,它从一行返回多个列中最大的一个,但是如何在多行的情况下应用相同的函数呢?

6l7fqoea

6l7fqoea1#

首先应用groupby来获取每个id的最大日期,如果需要单独的列或用例函数,则应用grest函数。

create table test_stackof_greatest (id int, dt1 date, dt2 date, d3 date) row format delimited fields terminated by '\t' lines terminated by '\n' stored as textfile;

insert into test_stackof_greatest values (1, '2018-10-09', '2018-10-08', '2018-10-10');
insert into test_stackof_gereatest values (1, '2018-10-10', null, '2018-10-11'), (1, '2018-11-10', '2018-10-10', '2018-10-12');
 insert into test_stackof_gereatest values (2, null, '2018-10-08', '2018-10-12'), (2, '2018-10-10', '2018-10-13', '2018-10-09');

select id, case when dt1>dt2 and dt1>dt3 then dt1 else null end, case when dt2>dt1 and dt2>dt3 then dt2 else null end, case when dt3>dt2 and dt3>dt1 then dt3 else null end, greatest(dt1, dt2, dt3) from (select id, max(dt1) as dt1, max(dt2) as dt2, max(d3) as dt3 from test_stackof_gereatest group by id) a;

Output
OK
1       2018-11-10      NULL    NULL    2018-11-10
2       NULL    2018-10-13      NULL    2018-10-13
Time taken: 20.467 seconds, Fetched: 2 row(s)

希望这有帮助

相关问题