在配置单元中,如何按时间会话和数据页对日志顺序进行排序

ql3eal8s  于 2021-06-25  发布在  Hive
关注(0)|答案(1)|浏览(441)

在配置单元中,在日志数据中重复的页希望分离和排序会话,并且只保留第一次会话时剩余的时间

ID          Page            Timestamp
Orestes     Login           152356
Orestes     Login           152360
Orestes     Account view    152368
Orestes     Account view    152372
Orestes     Transfer        152380
Orestes     Account view    152382
Orestes     Account view    152390
Orestes     Loan            152393
Antigone    Login           152382
Antigone    Transfer        152390
Antigone    Account view    152392
Antigone    Account view    152395
Antigone    Trust           152399

我想更改如下。

ID          Page            Timestamp   Sequence
Orestes     Login           152356      1
Orestes     Account view    152368      2
Orestes     Transfer        152380      3
Orestes     Account view    152382      4
Orestes     Loan            152393      5
Antigone    Login           152382      1
Antigone    Transfer        152390      2
Antigone    Account view    152392      3
Antigone    Trust           152399      4

表脚本是。。。

insert into log values('Orestes','Login',152356)
insert into log values('Orestes','Login',152360)
insert into log values('Orestes','Account view',152368)
insert into log values('Orestes','Account view',152372)
insert into log values('Orestes','Transfer',152380)
insert into log values('Orestes','Account view',152382)
insert into log values('Orestes','Account view',152390)
insert into log values('Orestes','Loan',152393)
insert into log values('Antigone','Login',152382)
insert into log values('Antigone','Transfer',152390)
insert into log values('Antigone','Account view',152392)
insert into log values('Antigone','Account view',152395)
insert into log values('Antigone','Trust',152399)```

为了这份工作,

With cte as
(
Select id, page, min(timestamp) timestamp from log group by id, page)
)
Select id, page, timestamp, rank() over (partition by id order by timestamp) from log

然而,在这种情况下,一个帐户视图的俄瑞斯忒斯是丢失的。我怎样才能解决这个问题?

v440hwme

v440hwme1#

使用lag,您可以找到上一页并在重复时进行筛选。

with log as (
select stack (13,
'Orestes','Login',           152356,
'Orestes','Login',           152360,
'Orestes','Account view',    152368,
'Orestes','Account view',    152372,
'Orestes','Transfer',        152380,
'Orestes','Account view',    152382,
'Orestes','Account view',    152390,
'Orestes','Loan',            152393,
'Antigone','Login',           152382,
'Antigone','Transfer',        152390,
'Antigone','Account view',    152392,
'Antigone','Account view',    152395,
'Antigone','Trust',           152399
) as (ID,Page,Timestamp) 
)

select id, page, timestamp, row_number() over(partition by id order by timestamp) sequence
from
(
select id, page, timestamp, lag(page) over(partition by id order by timestamp) prev_page
  from log
)s 
where (prev_page!=page) or (prev_page is null)
;

结果:

OK
Antigone        Login   152382  1
Antigone        Transfer        152390  2
Antigone        Account view    152392  3
Antigone        Trust   152399  4
Orestes Login   152356  1
Orestes Account view    152368  2
Orestes Transfer        152380  3
Orestes Account view    152382  4
Orestes Loan    152393  5
Time taken: 9.359 seconds, Fetched: 9 row(s)

相关问题