配置单元窗口函数

i2byvkas  于 2021-06-27  发布在  Hive
关注(0)|答案(1)|浏览(296)

我有以下格式的数据,我需要根据标志列值的变化生成标志\u日期列。

login_date      id      flag       flag_date   
5/1/2018        100     Y            NULL 
5/2/2018        100     Y            NULL
5/3/2018        100     N          5/3/2018
5/4/2018        100     N          5/3/2018
5/5/2018        100     Y          5/3/2018
5/6/2018        100     Y          5/3/2018
5/7/2018        100     N          5/7/2018
5/8/2018        100     Y          5/7/2018
5/9/2018        100     Y          5/7/2018
5/10/2018       100     N          5/10/2018

最初,flag\u date值将为空,但当flag从y更改为n时,flag\u date将被填充,并且该值将继续,直到下一个y更改为n。请帮忙。

jw5wzhpr

jw5wzhpr1#

你的问题似乎很容易与窗口函数,但它是棘手的。存在与先前记录相关的标志,并且在存在连续的y/n时使用第一个值。
从t1开始,我们取prior_flag和prior_flag_dt,即fg_dt
从t2开始,我们整理出连续的n/y fg\u dt2
在t3中,我们再次回顾整理出的fg\u dt2。现在,连续y/n中的第一条记录将具有需要用于下一个y/n的正确值
在最后的查询中,您将得到结果。
看看这个:

> create table hr02 ( login_date date, id int, flag string, flag_date date );

> insert into hr02 
select '2018-05-01', 100, 'Y', NULL
union all 
select '2018-05-02', 100, 'Y', NULL
union all
select '2018-05-03', 100, 'N', NULL
union all
select '2018-05-04', 100, 'N', NULL
union all
select '2018-05-05', 100, 'Y', NULL
union all
select '2018-05-06', 100, 'Y', NULL
union all
select '2018-05-07', 100, 'N', NULL
union all
select '2018-05-08', 100, 'Y', NULL
union all
select '2018-05-09', 100, 'Y', NULL
union all
select '2018-05-10', 100, 'N', NULL ;

> with t1 as ( select login_date, id, flag, lag(flag) over(order by login_date) as prior_flag, case when flag='Y' then lag(login_date) over(order by login_date)  else login_date end as fg_dt from hr02),
 t2 as ( select login_date, id, flag, prior_flag, fg_dt, case when flag='Y' then lag(fg_dt) over(order by login_date) when flag='N'  and prior_flag='N' then lag(fg_dt) over(order by login_date) else login_date end as fg_dt2 from t1 ),
 t3 as ( select login_date, id, flag, prior_flag, fg_dt, fg_dt2, case when flag='Y' and prior_flag='N' then lag(fg_dt2) over(order by login_date) when flag='N'  and prior_flag='N' then lag(fg_dt) over(order by login_date) else fg_dt2 end as fg_dt3  from t2)
 select login_date, id, flag, prior_flag, fg_dt, fg_dt2, fg_dt3, case when flag='Y' and prior_flag='Y' then lag(fg_dt3) over(order by login_date) else fg_dt3 end fg_dt4 from t3 ;

+-------------+------+-------+-------------+-------------+-------------+-------------+-------------+--+
| login_date  |  id  | flag  | prior_flag  |    fg_dt    |   fg_dt2    |   fg_dt3    |   fg_dt4    |
+-------------+------+-------+-------------+-------------+-------------+-------------+-------------+--+
| 2018-05-01  | 100  | Y     | NULL        | NULL        | NULL        | NULL        | NULL        |
| 2018-05-02  | 100  | Y     | Y           | 2018-05-01  | NULL        | NULL        | NULL        |
| 2018-05-03  | 100  | N     | Y           | 2018-05-03  | 2018-05-03  | 2018-05-03  | 2018-05-03  |
| 2018-05-04  | 100  | N     | N           | 2018-05-04  | 2018-05-03  | 2018-05-03  | 2018-05-03  |
| 2018-05-05  | 100  | Y     | N           | 2018-05-04  | 2018-05-04  | 2018-05-03  | 2018-05-03  |
| 2018-05-06  | 100  | Y     | Y           | 2018-05-05  | 2018-05-04  | 2018-05-04  | 2018-05-03  |
| 2018-05-07  | 100  | N     | Y           | 2018-05-07  | 2018-05-07  | 2018-05-07  | 2018-05-07  |
| 2018-05-08  | 100  | Y     | N           | 2018-05-07  | 2018-05-07  | 2018-05-07  | 2018-05-07  |
| 2018-05-09  | 100  | Y     | Y           | 2018-05-08  | 2018-05-07  | 2018-05-07  | 2018-05-07  |
| 2018-05-10  | 100  | N     | Y           | 2018-05-10  | 2018-05-10  | 2018-05-10  | 2018-05-10  |
+-------------+------+-------+-------------+-------------+-------------+-------------+-------------+--+

相关问题