postgresql 如何在sql中减去同一个表中的行对?

9jyewag0  于 2022-11-04  发布在  PostgreSQL
关注(0)|答案(3)|浏览(182)

我有一个名为activity的表,其中包含如下所示的值:

userId |      timestamp      | action
----------------------------------------
1      | 2022-10-18 10:00:00 | OPEN
2      | 2022-10-18 10:20:00 | OPEN
1      | 2022-10-18 10:05:00 | CLOSE
2      | 2022-10-18 10:22:00 | CLOSE
...

我希望能够得到每个用户在给定的一天中打开和关闭时间的差异,如下所示:

desiredTable
------------------------------------------------------
userId |     start_time      | time_elapsed_in_minutes
------------------------------------------------------
1      | 2022-10-18 10:00:00 | 5
2      | 2022-10-18 10:20:00 | 2

需要注意的几点是:
1.不能保证OPENCLOSE行彼此背对背,因为该表还包含许多其他操作类型。
1.此外,由于网络条件的原因,无法保证OPENCLOSE的行数相同,这可能会导致不报告其中一个。例如:用户1可以具有3个打开和1个关闭,因此仅需要计算1对。

我的方法

  • 为每个用户创建OPEN和CLOSE计数表
  • userId | # opens | # closes
  • 确定哪个数字更小(打开数或关闭数),并从活动表中获取行数
  • 打开具有上面确定的行数的表
  • userId | timestamp
  • 使用上面确定的行数关闭表
  • userId | timestamp
  • 从按用户ID分组打开时间戳中减去关闭时间戳,并截短为分钟

如有任何帮助,我们将不胜感激!

mum43rcc

mum43rcc1#

正如@Robert汉密尔顿所建议的,这可以直接转换为使用window functions的查询:

select  userId, 
        start_time, 
        floor(extract('epoch' from duration)/60) duration_in_minutes
from (
    select  userId,
            timestamp as start_time,
            case when action='CLOSE' and (lag(action)    over w1)='OPEN'
                 then timestamp        - (lag(timestamp) over w1)
            end as duration
  from activity
  window w1 as (partition by userId order by timestamp)) a
where duration is not null;

Fiddle.
但是,使用非常基本的结构也可以实现同样的效果--我在下面使用CTE只是为了可读性:

with 
"nearest_close_after_each_open" as 
    ( select 
        open.userId,
        open.timestamp as start_time,
        min(close.timestamp) as end_time 
      from activity as close
        inner join activity as open
            on open.action='OPEN' and close.action='CLOSE'
            and open.userId=close.userId
            and open.timestamp < close.timestamp 
      group by 1,2),
"longest_open_window_before_a_close" as
    ( select 
        userId, 
        end_time,
        min(start_time) as start_time 
      from nearest_close_after_each_open
      group by 1,2),
"shortest_open_window_before_a_close" as
    ( select 
        userId, 
        end_time,
        max(start_time) as start_time 
      from nearest_close_after_each_open
      group by 1,2)  
select 
  userId,
  start_time,
  end_time,
  floor(extract('epoch' from end_time-start_time )/60) as time_elapsed_in_minutes
from "shortest_open_window_before_a_close";
--from "longest_open_window_before_a_close";

Fiddle.
我通常认为序列中所有后面的OPEN都是第一个的重传,类似于您对CLOSE的假设,这就是为什么我还添加了longest_open_window_before_a_close--将最早记录的OPENCLOSE配对。
我发现的一个未提及的细节是
每个用户在指定日期的开始和结束时间之间的差异
我认为这意味着所有打开的窗口都应该在午夜被切断,而所有在接下来的日期被孤立的CLOSES都应该被假设为在午夜打开。

nzkunb0c

nzkunb0c2#

我们将每个action = 'close'与它前面的action = 'open'分组,然后按id分组并选择时间戳。

select    userid
         ,min(timestamp)                as start_time
         ,max(timestamp)-min(timestamp) as time_elapsed_in_minutes
from     (
          select  *
                  ,count(case action when 'OPEN' then 1 end) over(partition by userid order by timestamp) as grp
          from   t
          where  action in('OPEN', 'CLOSE')
          ) t
group by  userid, grp

| 用户标识|开始时间|已用时间(分钟)|
| - -|- -|- -|
| 一个|2022年10月18日10时00分|00时05分|
| 2个|2022年10月18日10时20分|00时02分|
Fiddle

pbwdgjma

pbwdgjma3#

当下一个actionclosed时,我们可以使用lead()

select    * 
from     (
          select   userid
                  ,timestamp                                                                                                                                                   as start_time
                  ,case lead(action) over(partition by userid order by timestamp) when 'CLOSE' then lead(timestamp) over(partition by userid order by timestamp)-timestamp end as time_elapsed_in_minutes
          from   t
          where  action in('OPEN', 'CLOSE')
          ) t
where     time_elapsed_in_minutes is not null

| 用户标识|开始时间|已用时间(分钟)|
| - -|- -|- -|
| 一个|2022年10月18日10时00分|00时05分|
| 2个|2022年10月18日10时20分|00时02分|
Fiddle

相关问题