postgresql 根据开始和结束时间计算字段has_impact

dgenwo3n  于 2023-06-22  发布在  PostgreSQL
关注(0)|答案(4)|浏览(142)

尝试根据以下内容为会话中的每个提要视图填充字段 has_impact
(view_end_time - view_start_time) > 3view_perc > 0.8之间存在差异时,has_impact为true,否则为false。

  • view_logs* 表
create table view_logs(session_id varchar(10), post_id int, 
                         ts int, event_name varchar(50), view_perc float);
    
insert into view_logs(session_id, post_id, ts, event_name, view_perc)
    values
    ('m1', 1000, 1524600, 'view_start', null),
    ('m1', 1000, 1524602, 'view_end', 0.85),
    ('m1', 1000, 1524650, 'view_start', null),
    ('m1', 1000, 1524654, 'view_end', 0.9),
    ('m1', 2000, 1524700, 'view_start', null),
    ('m1', 2000, 1524707, 'view_end', 0.3),
    ('m1', 2000, 1524710, 'view_start', null),
    ('m1', 2000, 1524713, 'view_end', 0.9);

我尝试了**this fiddle**,但没有得到所需的输出。谁能帮我找出我做错了什么?
查询:

with cte as
  (  
    select
      pv1.session_id,
      pv1.post_id,
      pv2.view_perc,
      pv1.ts as start_time,
      min(pv2.ts) as end_time
    from view_logs pv1
    join view_logs pv2
    on pv1.session_id = pv2.session_id
    and pv1.post_id = pv2.post_id
    and pv1.event_name <> pv2.event_name
    and pv1.ts < pv2.ts
    group by
      pv1.session_id,
      pv1.post_id,
      pv2.view_perc,
      pv1.ts
 )  
 
select
    session_id,
    post_id,
    start_time,
    end_time,
    case
        when (end_time - start_time > 3  and view_perc > 0.8 )then 'yes'
        else 'no'
    end as has_meaningful_view
from cte

期望输出:

*--------------------------------------------------------*
|session_id| post_id | start_time | end_time | has_impact|
*--------------------------------------------------------*
|  m1      | 1000    | 1524600    | 1524602  | no        |
|  m1      | 1000    | 1524650    | 1524654  | yes       |
|  m1      | 2000    | 1524700    | 1524707  | no        |
|  m1      | 2000    | 1524710    | 1524713  | no        |
*--------------------------------------------------------*
2nbm6dog

2nbm6dog1#

假设每个开始时间都有一个结束时间,可以使用row_number函数进行聚合,如下所示:

select session_id, post_id,
       max(case when event_name='view_start' then ts end) startTime,
       max(case when event_name='view_end' then ts end) endTime,
       case 
         when 
          max(case when event_name='view_end' then ts end)
        - max(case when event_name='view_start' then ts end) > 3 
          and max(view_perc) > 0.8
         then 'yes' else 'no'
       end as hasimpact
from
(
  select *,
    row_number() over (partition by session_id, post_id order by ts) rn
  from view_logs
) t
group by session_id, post_id, (rn-1)/ 2
order by session_id, post_id, min(ts)

Demo

1sbrub3j

1sbrub3j2#

每个带有view_start的记录和它后面带有view_end的记录可以使用self join连接,因为你没有递增的列,我们可以使用row_number()构造一个列,用作join子句的条件:

with cte as (
  select *, row_number() over( partition by session_id, post_id order by ts) as rn
  from view_logs
)
select c1.session_id, c1.post_id,
         c1.ts as start_time, c2.ts as end_time,
         case when c2.view_perc > 0.8 and c2.ts - c1.ts > 3
              then 'Yes' else 'No' end as has_impact
from cte c1
inner join cte c2 on c1.event_name = 'view_start' 
                  and c2.event_name = 'view_end'
                  and c1.session_id = c2.session_id
                  and c1.post_id = c2.post_id
                  and c2.rn = c1.rn + 1
order by c1.session_id, c1.post_id, c1.ts

Demo here

qcbq4gxm

qcbq4gxm3#

这可能无法得到您想要的结果,但它确实计算has_impact

WITH cte
AS (
    SELECT *
        ,CASE 
            WHEN event_name = 'view_start'
                THEN ts
            END AS start_time
        ,CASE 
            WHEN event_name = 'view_end'
                THEN ts
            END AS end_time
    FROM view_logs
    )
SELECT session_id
    ,post_id
    ,start_time
    ,end_time
    ,CASE 
        WHEN end_time - lag(start_time, 1) OVER (
                ORDER BY (
                        SELECT NULL
                        )
                ) > 3
            AND view_perc > 0.8
            THEN true
        ELSE false
        END AS has_impact
FROM cte;

输出:
| 会话ID| post_id|开始时间|结束时间|有影响|
| - -----|- -----|- -----|- -----|- -----|
| m1|一千|1524600|零|假的|
| m1|一千|零|1524602|假的|
| m1|一千|1524650|零|假的|
| m1|一千|零|1524654|真的|
| m1|二千年|1524700|零|假的|
| m1|二千年|零|1524707|假的|
| m1|二千年|1524710|零|假的|
| m1|二千年|零|1524713|假的|

osh3o9ms

osh3o9ms4#

假设每一个结束都有一个紧接在它前面的开始,你可以在一个子查询中简单地使用窗口函数lag()一次:

SELECT session_id, post_id, start_time, ts AS end_time, view_perc
     , (ts - start_time) > 3 AND view_perc > 0.8 AS has_impact
FROM  (
   SELECT *, lag(ts) OVER (PARTITION BY session_id, post_id ORDER BY ts) AS start_time
   FROM   view_logs
   ) sub
WHERE  event_name = 'view_end'
ORDER  BY session_id, post_id, start_time;  -- optional

fiddle
应该是最快的,因为它只需要一个单一的通过表。
如果我的假设不成立,声明我们可以假设什么 * 确切 *。

相关问题