select tn1.id,
array_agg(tn2.id) as duplicate_entries,
from table_name tn1 join table_name tn2 on
tn1.year = tn2.year
and tn1.sid = tn2.sid
and tn1.user_id = tn2.user_id
and tn1.cid = tn2.cid
and tn1.id <> tn2.id
group by tn1.id;
with dupe_set as (
select tn1.id,
array_agg(tn2.id) as duplicate_entries,
from table_name tn1 join table_name tn2 on
tn1.year = tn2.year
and tn1.sid = tn2.sid
and tn1.user_id = tn2.user_id
and tn1.cid = tn2.cid
and tn1.id <> tn2.id
group by tn1.id
order by tn1.id asc)
select ds.id from dupe_set ds where not exists
(select de from unnest(ds.duplicate_entries) as de where de < ds.id)
WITH ordered AS (
SELECT id,year, user_id, sid, cid,
rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk
FROM user_links
),
to_delete AS (
SELECT id
FROM ordered
WHERE rnk > 1
)
DELETE
FROM user_links
USING to_delete
WHERE user_link.id = to_delete.id;
如果要测试它,请稍微更改它:
WITH ordered AS (
SELECT id,year, user_id, sid, cid,
rank() OVER (PARTITION BY year, user_id, sid, cid ORDER BY id) AS rnk
FROM user_links
),
to_delete AS (
SELECT id,year,user_id,sid, cid
FROM ordered
WHERE rnk > 1
)
SELECT * FROM to_delete;
WITH duplicated AS (
SELECT id,
count(*)
FROM products
GROUP BY id
HAVING count(*) > 1),
ordered AS (
SELECT p.id,
created_at,
rank() OVER (partition BY p.id ORDER BY p.created_at) AS rnk
FROM products o
JOIN duplicated d ON d.id = p.id ),
products_to_delete AS (
SELECT id,
created_at
FROM ordered
WHERE rnk = 2
)
DELETE
FROM products
USING products_to_delete
WHERE products.id = products_to_delete.id
AND products.created_at = products_to_delete.created_at;
(select id, year, user_id, sid, cid from user_links order by 1)
except
select distinct on (year, user_id, sid, cid) id, year, user_id, sid, cid
from user_links order by 1;
except all也有效。因为id serial使所有行唯一。
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid from user_links order by 1;
到目前为止,可以处理空值和非空值。
删除:
with a as(
(select id, year, user_id, sid, cid from user_links order by 1)
except all
select distinct on (year, user_id, sid, cid)
id, year, user_id, sid, cid from user_links order by 1)
delete from user_links using a where user_links.id = a.id returning *;
8条答案
按热度按时间3ks5zfa01#
基本思想是使用带有计数聚合的嵌套查询:
您可以调整内部查询中的where子句以缩小搜索范围。
对于评论中提到的问题,还有另一个很好的解决方案,(但并不是每个人都读它们):
或更短:
c6ubokkw2#
从“使用PostgreSQL查找重复行”中可以找到一个聪明的解决方案:
6gpjuf903#
为了简化,我假设您只希望对year列应用唯一约束,并且主键是一个名为id的列。
为了查找重复值,您应该运行,
使用上面的sql语句,您将得到一个包含所有重复年份的表。为了删除除最新重复条目之外的所有重复条目,您应该使用上面的sql语句。
guykilcj4#
您可以在将要复制的字段上联接到同一个表,然后在ID字段上反联接。从第一个表别名(tn1)中选择ID字段,然后在第二个表别名的ID字段上使用array_agg函数。最后,为了使array_agg函数正常工作,您将按www.example.com字段对结果进行分组tn1.id。这将生成一个结果集,其中包含记录的id和一个符合连接条件的所有id的数组。
显然,将在duplicate_entries数组中的ID也将在结果集中有它们自己的条目。您将不得不使用此结果集来决定您希望哪个ID成为'truth'的源。不应删除的一条记录。也许您可以这样做:
选择具有重复项的最小编号的ID(假设ID的整数PK递增)。这些将是您要保留的ID。
lhcgjxsq5#
受桑德罗·威格斯的启发,我做了一些类似的事情
如果要测试它,请稍微更改它:
这将给予出将要删除的内容的概述(在运行删除时,在to_delete查询中保留year、user_id、sid、cid没有问题,但之后就不需要它们了)
mtb9vblg6#
在您的情况下,由于约束,您需要删除重复的记录。
1.查找重复行
1.按
created_at
日期组织它们-在本例中,我保留最早的1.删除带有
USING
的记录以过滤正确的行ffx8fchx7#
使用distinct和except进行集合运算。
except all也有效。因为id serial使所有行唯一。
到目前为止,可以处理空值和非空值。
删除:
8fq7wneg8#
遵循SQL语法可以在检查重复行时提供更好的性能。