单个记录上的左联接

sczxawaw  于 2022-09-21  发布在  PostgreSQL
关注(0)|答案(6)|浏览(151)

我有存储事件的下一个表:(简化结构)
ID|用户|操作|时间戳
-|-|
12|user1|完|2022-01-01 05:00
43|user1|启动|2022-01-01 04:00
54|user1|完|2022-01-01 03:00
13|user1|开始|2022-01-01 02:00

我需要加入两个事件在一排,所以任何开始事件都伴随着结束事件在那之后。

因此,下一个结果应该是:
ID1|ID2|用户|开始时间戳|结束时间戳
-|
13|54|user1|2022-01-01 02:00|2022-01-01 03:00
43|12|user1|2022-01-01 04:00|2022-01-01 05:00

理想情况下,它不应该有太多的性能问题,因为表中可能有很多记录。

我尝试了下一个查询:

select 
  s.id as "ID1",
  e.id as "ID2",
  s.user,
  s.time as "Start Time",
  e.time as "End Time"
from Events s
left join Events e on s.user = e.user
where s.action = 'START'
  and e.action = 'END'
  and s.timestamp < e.timestamp

但它也会匹配记录13和记录12。可以只从左到右连接一次吗?(请记住,从时间上讲,IS应该是下一个结束记录?

谢谢

zrfyljdw

zrfyljdw1#

我们希望为每个START事件获取最接近的END事件的时间戳。

我会采用以下方法:

1.获取每个START事件的最小值时间戳差异
1.现在使用timedelta查找实际的END事件。

假设

1.我们最多只能有一个尚未结束的活动!
1.对于每个START事件,时间戳将是唯一的。(END事件也是如此。

WITH closest_to_start AS (
    SELECT 
        s.id,
        MIN(TIMESTAMPDIFF(SECOND, s.timestamp, e.timestamp)) AS min_delta
    FROM Events AS s
    INNER JOIN Events AS e ON s.user = e.user
    WHERE s.action = 'START'
    AND e.action = 'END'
    GROUP BY s.id
    HAVING min_delta >= 0
)
SELECT s.id, 
    e.id
FROM Events AS s
OUTER JOIN closest_to_start ON closest_to_start.id = s.id
OUTER JOIN Events AS e ON e.id = s.id
WHERE s.action = 'START'
AND e.action = 'END'
AND 
(
    e.timestamp IS NULL
    OR
    TIMESTAMPDIFF(SECOND, s.timestamp, e.timestamp) = closest_to_start.min_delta
)
fcwjkofz

fcwjkofz2#

下面是一个使用横向联接的PostgreSQL解决方案。它可能正在HANA上工作,因为没有使用Postgres特定的功能。内部查询为在相应的“开始”之后最早出现的同一用户选择“结束”操作。已经开始但尚未完成的事件的“ID2”和“End Timestamp”的值将为空。

create temporary table the_table(id integer, usr text, action text, ts timestamp);
insert into the_table values 
(12,'user1','END','2022-01-01 05:00'),(43,'user1','START','2022-01-01 04:00'),
(54,'user1','END','2022-01-01 03:00'),(13,'user1','START','2022-01-01 02:00');

select tx.id as "ID1", l.id as "ID2", tx.usr as "User", 
       tx.ts as "Start timestamp", l.ts as "End timestamp" 
from the_table as tx
left join lateral 
(
  select ti.id, ti.ts
  from the_table as ti 
  where ti.action = 'END' 
    and ti.ts > tx.ts 
    and ti.usr = tx.usr
  order by ti.ts - tx.ts
  limit 1
) as l on true  
where tx.action = 'START'
order by "Start timestamp";
avwztpqn

avwztpqn3#

您可以使用窗口函数Lead。

with Daten 
as
(
Select 12 as ID, 'user1' as Benutzer, 'END' as action, '05:00' as Time
Union
Select 43 as ID, 'user1' as Benutzer, 'Start' as action, '04:00' as Time
Union
Select 54 as ID, 'user1' as Benutzer, 'END' as action, '03:00' as Time
Union
Select 13 as ID, 'user1' as Benutzer, 'Start' as action, '02:00' as Time
)
Select 
    *
from
    (
        Select
            *,
            lead(ID,1) over (order by number) as ID2,
            lead(action,1) over (order by number) as action2,
            lead(time,1) over (order by number) as time2
        from
            (
                Select 
                    *,
                    ROW_NUMBER() OVER(ORDER BY Benutzer,Time,action) as number
                from
                     Daten
            ) x
    ) y
where y.action = 'Start'
rqenqsqc

rqenqsqc4#

一种方法是横向联接,它选择大于“开始”时间戳的最小“结束”时间戳:

select st.id as id1,
       en.id as id2,
       st."timestamp" as start_timestamp,
       en."timestamp" as end_timestamp
from events st
  left join lateral (
      select id, "timestamp"
      from events e
      where e."user" = st."user" 
        and e.action = 'END'
      and e.timestamp >= st.timestamp
      order by "timestamp" 
      fetch first 1 row only
  ) en on true
where st.action = 'START';

以上是标准的ANSI SQL,至少可以在postgres中使用。

在Postgres中,我将在events ("user", "timestamp") where action = 'END'上创建一个索引,以加快横向查询。

sdnqo3pr

sdnqo3pr5#

上面查询的问题是,对于每个开始事件,可以有多个结束事件,这些事件发生在之后。但是,您想要选择最接近开始事件的一个。您可以通过添加额外的聚合来实现这一点。

请找到HANA示例(不使用HANA特定功能):

CREATE TABLE TEST (ID integer, USER NVARCHAR(20), ACTION NVARCHAR(20), TIMESTAMP DATETIME)

INSERT INTO TEST VALUES (12, 'user1', 'END', '2022-01-01 05:00')
INSERT INTO TEST VALUES (43, 'user1', 'START', '2022-01-01 04:00')
INSERT INTO TEST VALUES (54, 'user1', 'END', '2022-01-01 03:00')
INSERT INTO TEST VALUES (13, 'user1', 'START', '2022-01-01 02:00')

INSERT INTO TEST VALUES (13, 'user1', 'START', '2022-01-01 09:00')

SELECT
    S.ID ID1,
    S.USER,
    S.ACTION,
    S.TIMESTAMP START_TIME,
    MIN(E.TIMESTAMP) END_TIME
FROM TEST S
JOIN TEST E ON (
    s.USER = e.USER AND
    s.ACTION = 'START' AND
    e.ACTION = 'END' AND
    e.TIMESTAMP >= s.TIMESTAMP
)
GROUP BY S.ID, S.ACTION, S.USER, S.TIMESTAMP

如果需要包含E.ID,则需要将其重新连接到结果集。请注意,可能存在多个具有相同时间戳的结束事件,您需要在重新加入E.ID时处理这些事件。

如果您还想包括START事件而不包括相应的END事件,则可以使用以下选项:

INSERT INTO TEST VALUES (13, 'user1', 'START', '2022-01-01 09:00')

SELECT
    S.ID ID1,
    S.USER,
    S.ACTION,
    S.TIMESTAMP START_TIME,
    MIN(E.TIMESTAMP) END_TIME
FROM TEST S
LEFT JOIN TEST E ON (
    s.USER = e.USER AND
    e.ACTION = 'END' AND
    e.TIMESTAMP >= s.TIMESTAMP
)
WHERE s.ACTION ='START'
GROUP BY S.ID, S.ACTION, S.USER, S.TIMESTAMP
tct7dpnv

tct7dpnv6#

在HANA SQL中测试的解决方案

相同的查询,但排除了不是最小持续时间的记录

CREATE TABLE "TESTSCHEMA"."EVENTS" (ID integer, "user" NVARCHAR(20), "action" NVARCHAR(20), "timestamp" SECONDDATE);

INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (12, 'user1', 'END', '2022-01-01 05:00');
INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (43, 'user1', 'START', '2022-01-01 04:00');
INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (54, 'user1', 'END', '2022-01-01 03:00');
INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (13, 'user1', 'START', '2022-01-01 02:00');
INSERT INTO "TESTSCHEMA"."EVENTS" VALUES (13, 'user1', 'START', '2022-01-01 09:00');

SELECT "ID1","ID2","Start Time","End Time" FROM
(
 select 
 ROW_NUMBER() OVER(PARTITION BY s."ID" order by SECONDS_BETWEEN(e."timestamp",s."timestamp") DESC) AS RN,
  s."ID" as "ID1",
  e."ID" as "ID2",
  s."user",
  s."timestamp" as "Start Time",
  e."timestamp" as "End Time",
  SECONDS_BETWEEN(e."timestamp",s."timestamp") AS "Duration"
  from "TESTSCHEMA"."EVENTS" s
left join "TESTSCHEMA"."EVENTS" e on s."user" = e."user"
where s."action" = 'START'
  and e."action" = 'END'
  and s."timestamp" < e."timestamp"
)WHERE RN=1

相关问题