在PostgreSQL中使用递归根据年龄、状态和相关行的存在性删除行

hm2xizp9  于 2023-11-18  发布在  PostgreSQL
关注(0)|答案(1)|浏览(124)
create table the_table (
  id integer, root_id integer, parent_id integer, status text, ts timestamp, comment text);
insert into the_table values
(1, null, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, standalone'),
(2, null, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, root of 3,4'),
(3, 2,    null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, child of 2, parent of 4'),
(4, 2,    3,    'OPEN',     now()-'92d'::interval, '>90 days old, open, child of 2,3'),
(5, null, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, root of 6,7'),
(6, 5,    null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, child of 5, parent of 4'),
(7, 5,    6,    'COMPLETE', now()-'10d'::interval, '<=90 days old, complete, child of 5,6' )
(8, null, null, 'COMPLETE', now()-'10d'::interval, '<=90 days old, complete, standalone'),
(9, null, null, 'COMPLETE', now()-'10d'::interval, '<=90 days old, complete, root of 10'),
(10,9,    null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, child of 9' ),
(11,11,   null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, parent/child of self'),
(12,null, 12,   'COMPLETE', now()-'92d'::interval, '>90 days old, complete, parent/child of self'),
(13,14,   null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, cross-parent/child of 14'),
(14,13,   null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, cross-parent/child of 13'),
(15,null, null, 'COMPLETE', now()-'92d'::interval, '>90 days old, complete, parent of 16,17'),
(16,null, 15,   'COMPLETE', now()-'92d'::interval, '>90 days old, complete, child of 15'),
(17,null, 15,   'OPEN',     now()-'10d'::interval, '<=90 days old, open, child of 15');

字符串
我想删除所有记录,

  • 超过90天
  • 与处于COMPLETE状态。
  • ANDnot与不超过90天且COMPLETE的行直接或间接相关

示例如下:

  • 应删除具有id=1的行。
  • id=2id=3COMPLETE并且超过90天,但是由于id=4OPEN并且具有root_id=2parent_id=3,所以这三行(id in (2,3,4))不应该被删除。
  • id=5id=6id=7都是COMPLETE,但id=7不超过90天,并且有root_id=5parent_id=6,因此这三行(id=5id=6id=7)不应删除。
  • id=15id=16都足够老,COMPLETEid=17仍然是OPEN,并且小于90天。1617没有直接关系,但它们共享15作为父行,因此不应删除这三行。

我尝试了这个answer,但有一个案例失败了:一个可移动的父节点有两个子节点,一个不可移动,一个可移动。(最后一个测试案例在这里fiddle
删除时,我想保留整个组,直到所有成员都可删除。我尝试使用递归CTE,但仍然无法实现。下面是我尝试的查询:

WITH recursive RecursiveHierarchy AS (
      SELECT id, root_id, parent_id, status, ts, comment
      FROM the_table
      WHERE status = 'OPEN'
      UNION ALL
      SELECT t.id, t.root_id, t.parent_id, t.status, t.ts, t.comment
      FROM the_table t
      JOIN RecursiveHierarchy r ON (  ( 'OPEN'=t.status
                             OR now()-t.ts <= '90d'::interval   ) 
                        AND (   t.id IN(r.root_id,r.parent_id)
                             OR r.id IN(t.root_id,t.parent_id) ) ) 
    )
    DELETE FROM the_table
    WHERE ts < NOW() - '90 days'::interval
    AND status = 'COMPLETE'
    AND id NOT IN (SELECT id FROM RecursiveHierarchy)
    RETURNING *;


我也试过用函数

CREATE OR REPLACE FUNCTION delete_old_records()
RETURNS VOID AS $$
DECLARE
  record_to_delete the_table;
BEGIN
  FOR record_to_delete IN SELECT * FROM the_table
  WHERE ts < NOW() - '90 days'::interval AND status = 'COMPLETE'
  LOOP
    IF NOT EXISTS (
      SELECT 1 FROM the_table
      WHERE id = record_to_delete.id
        OR root_id = record_to_delete.id
        OR parent_id = record_to_delete.id
    ) THEN
      DELETE FROM the_table WHERE id = record_to_delete.id;
    END IF;
  END LOOP;
END;
$$ LANGUAGE plpgsql;

e4yzc0pl

e4yzc0pl1#

1.您在基本CTE查询中忘记了90天的年龄
1.您正在收集所有具有 * 至少一个 * 不可移除行的行组,方法是找到所有这些行,然后查看它们的根/父/子关系,以获得要保护的行的完整列表。由于您已经检查了CTE中的所有条件,因此除了针对受保护的不可移除行列表的WHERE id NOT IN()之外,您不需要在最终DELETE中添加任何内容。
1.您的结构需要在模式级别防止循环/回路,或者您需要通过使用循环检测来防止CTE陷入其中。

WITH recursive RecursiveHierarchy AS (--base query finds all non-removables
  SELECT id, root_id, parent_id
  FROM the_table
  WHERE status = 'OPEN' OR now()-ts <= '90d'::interval
  UNION ALL             --recursion looks for all related to non-removables
  SELECT t.id, t.root_id, t.parent_id
  FROM the_table t
  JOIN RecursiveHierarchy r ON r.id in(t.parent_id,t.root_id)
                            OR t.id in(r.parent_id,r.root_id) 
) CYCLE id SET is_cycle USING path
DELETE FROM the_table
WHERE id NOT IN (SELECT id FROM RecursiveHierarchy
                 WHERE NOT is_cycle)
RETURNING *;

字符串
| ID|根ID|父ID|地位|ts|评论|
| --|--|--|--|--|--|
| 1 |* 空 || 完成|2023-08-13 11:26:19.770144|>90天,完整,独立|
| 11 | 11 |
| 完成|2023-08-13 11:26:19.785285|>90天,完整,自身的父母/子女|
| 12 |
| 12 |完成|2023-08-13 11:26:19.785285|>90天,完整,自身的父母/子女|
| 13 | 14 |
| 完成|2023-08-13 11:26:19.785285|>90天,完整,交叉父母/14岁的孩子|
| 14 | 13 |
空 *| 完成|2023-08-13 11:26:19.785285|>90天,完整,交叉父母/13岁的孩子|
Demo at db<>fiddle
您也可以采取相反的方法:找到所有立即删除并搜索它们的关系,以确保它们没有链接到不可删除的内容:demo2

WITH recursive RecursiveHierarchy AS (
  SELECT id as candidate_id, id, root_id, parent_id, true AS is_removable
  FROM the_table
  WHERE status = 'COMPLETE' AND now()-ts > '90d'::interval
  UNION ALL
  SELECT r.candidate_id, t.id, t.root_id, t.parent_id, 
         t.status = 'COMPLETE' AND now()-t.ts > '90d'::interval
  FROM the_table t
  JOIN RecursiveHierarchy r ON (   r.id in(t.parent_id,t.root_id)
                                OR t.id in(r.parent_id,r.root_id) ) 
                            AND t.id NOT IN (r.candidate_id,r.id)
  WHERE NOT is_cycle
) CYCLE id SET is_cycle USING path
,removables AS (
  SELECT candidate_id AS id FROM RecursiveHierarchy 
  GROUP BY candidate_id HAVING bool_and(is_removable)
)
DELETE FROM the_table
WHERE id IN (SELECT id FROM removables)
RETURNING *;


请注意,无论您采用何种方法,由于问题的递归性质,这注定是相当复杂的。在最坏的情况下,您可能有15 k条记录,每一条都与下一条相关,所有这些记录都是可移动的,只有最后一条是不可移动的。整个集合仍然需要逐个检查,这在这个结构中需要时间和内存。你可以做的是改变结构,引入一个group_id/family_id:用一个公共id标记一个家族中的所有行,并让一个触发器维护它。然后,搜索不可移除关系变得微不足道。
如果你真的必须在db上进行复杂的图遍历,考虑pgroutingexample)或apache-age

相关问题