我用sql为一个包含用户、组和事件时间的表生成会话id。会话定义为10分钟的窗口。我当前的实现生成会话ID;但是,有一个警告:可能有属于多个组的用户,而这不会反映在会话id分类中
架构示例:
userid | group | event_time
001 A 2020-06-20 02:04:50.000
001 A. 2020-06-20 02:06:12.000
001. A 2020-06-20 02:17:16.000
001. B. 2020-06-20 02:20:10.000
001. A. 2020-06-20 02:28:13.000
002. A. 2020-06-20 04:13:97.000
sql代码段:
tmp_table AS (
SELECT
*,
ROW_NUMBER() OVER (
PARTITION BY
userid
ORDER BY
event_time
) AS user_row,
LAG(userid) OVER (
PARTITION BY
userid
ORDER BY
event_time
) AS prev_user,
LAG(event_time) OVER (
PARTITION BY userid
ORDER BY
event_time
) AS prev_action
FROM table
ORDER BY
userid,
event_time
)
SELECT
*,
CASE
WHEN prev_user = user_row AND DATE_DIFF('minute', prev_action, event_time) < 10
THEN LAG(user_row) OVER (
PARTITION BY
userid
ORDER BY
user_row
)
ELSE user_row
END AS session_id
FROM tmp_table
然而,这会产生
userid | group | event_time | session_id
001 A 2020-06-20 02:04:50.000. 1
001 A. 2020-06-20 02:06:12.000. 1
001. A 2020-06-20 02:17:16.000. 2
001. B. 2020-06-20 02:20:10.000. 2
001. A. 2020-06-20 02:28:13.000. 2
002. A. 2020-06-20 04:13:97.000. 1
应该什么时候
userid | group | event_time | session_id
001 A 2020-06-20 02:04:50.000. 1
001 A. 2020-06-20 02:06:12.000. 1
001. A 2020-06-20 02:17:16.000. 2
001. B. 2020-06-20 02:20:10.000. 1
001. A. 2020-06-20 02:28:13.000. 3
002. A. 2020-06-20 04:13:97.000. 1
因为userid001同时属于a和b,在a和b中发生的事情是相互独立的。
1条答案
按热度按时间mlmc2os51#
您可以简化会话的计算。只需查看每个事件的前一个事件时间
userid
/group
组合。当差值大于或等于10时,新会话开始:我不知道你的代码应该如何工作。但我希望
group
在partition by
如果你想重新开始每组的计数。