postgres从表中删除记录,保留最小值和最大值

dly7yett  于 2021-07-27  发布在  Java
关注(0)|答案(3)|浏览(433)

我有一张postgres表:

|scanID|scandatetime     |eventcode|state|
------------------------------------------
|12345 |2020-07-28 1:00  |123      |WA   |
|12345 |2020-07-28 2:00  |156      |WA   |
|12345 |2020-07-29 10:00 |200      |OR   |
|34678 |2020-07-20 4:00  |123      |TX   |
|34678 |2020-07-20 8:00  |156      |AR   |
|34678 |2020-07-22 1:00  |200      |MS   |

基本上,我想删除每一行,这样每个扫描id只保留2行,我想用最短时间和最长时间保留扫描id。
当前的工作流程是每天都将数据聚合并写入此表,因此写入后可能会有一系列新的扫描事件,但我只想保持最大值和最小值。我将如何执行该操作?
编辑:所需的结果表如下所示

|scanID|scandatetime     |eventcode|state|
------------------------------------------
|12345 |2020-07-28 1:00  |123      |WA   |
|12345 |2020-07-29 10:00 |200      |OR   |
|34678 |2020-07-20 4:00  |123      |TX   |
|34678 |2020-07-22 1:00  |200      |MS   |
xzlaal3s

xzlaal3s1#

你可以用 using :

delete from t
    using (select scanId, min(scandatetime) as min_sdt, max(scandatetime) as max_sdt
           from t
           group by scanid
          ) tt
    where tt.scanId = t.scanId and t.scandatetime not in (tt.min_sdt, tt.max_sdt);

你也可以这样说:

delete from t
    where scandatetime <> (select min(t2.scandatetime) from t tt where tt.scanid = t.scanid) and
          scandatetime <> (select max(t2.scandatetime) from t tt where tt.scanid = t.scanid) ;
62o28rlo

62o28rlo2#

可以将“不在”与子选择一起使用:

delete from the_table t1
where (scanid, scandatetime) not in (select scanid, min(scandatetime)
                                     from the_table
                                     group by scanid
                                     union all
                                     select scanid, max(scandatetime)
                                     from the_table
                                     group by scanid);

但我假设解决方法是 exists 会更快。

t9aqgxwy

t9aqgxwy3#

如果上面(至少)有一条记录,下面(至少)有一条记录,则记录位于中间:

DELETE FROM ztable d
WHERE EXISTS ( SELECT *         
        FROM  ztable x
        WHERE x.scanId = d.scanId
        AND x.scandatetime < d.scandatetime
        )
AND EXISTS ( SELECT *
        FROM  ztable x
        WHERE x.scanId = d.scanId
        AND x.scandatetime > d.scandatetime
        );

类似的技巧,使用 row_number() :

DELETE FROM ztable d
USING ( SELECT scanId, scandatetime
        , row_number() OVER 
                (PARTITION BY scanId ORDER BY scandatetime ASC) rn
        , row_number() OVER
                (PARTITION BY scanId ORDER BY scandatetime DESC) rrn
        FROM  ztable 
        ) x
        WHERE x.scanId = d.scanId
        AND x.scandatetime = d.scandatetime
        AND x.rn <> 1 AND x.rrn <> 1

        ;

相关问题