Is it possible to find duplicates in a SQL table across multiple columns in a way that only requires a match by one of the columns?
For example, lets say I have a table with the following Schema: ID, C1, C2
My goal is to return a new table with a column called "Group ID" which is the ID
of the group the record lives. A given record should exist in the group if there is a match by at least one of C1
or C2
.
For example, consider this table
Create Table #Target
(ID Varchar(MAX),C1 Varchar(MAX),C2 Varchar(MAX))
Insert INTO #Target
Values
('1','p1','e1'),
('2','p2','e2'),
('3','p1','e2'),
('4','p3','e3'),
('5','p3','e4'),
('6','p4','e4')
ID | C1 | C2 |
---|---|---|
1 | p1 | e1 |
2 | p2 | e2 |
3 | p1 | e2 |
4 | p3 | e3 |
5 | p3 | e4 |
6 | p4 | e4 |
The desired output would look something like
| ID | C1 | C2 | GID |
| ------------ | ------------ | ------------ | ------------ |
| 1 | p1 | e1 | G1 |
| 2 | p2 | e2 | G1 |
| 3 | p1 | e2 | G1 |
| 4 | p3 | e3 | G2 |
| 5 | p3 | e4 | G2 |
| 6 | p4 | e4 | G2 |
When trying to come up with a solution I tried grouping by each column individually like this:
#1 Group By C1 and Assign Unique ID (C1GID) to each Group
#2 Group By C2 and Assign Unique ID (C2GID) to each Group
Now the problem I am having is how to group by C1GID
OR C2GID
. I don't know how to group rows that share at least one of these column values in common.
Update: I am getting closer. I am able to generate a set of IDs that have a match by at least one of the columns using this
SELECT *, CAST(NULL AS INT) AS ID_To INTO #t FROM ( VALUES ('1','p1','e1'), ('2','p2','e2'), ('3','p1','e2'), ('4','p3','e3'), ('5','p3','e4'), ('6','p4','e4') ) t (ID,C1,C2)
Select ID1, STRING_AGG(ID3, ', ') + ',' + STRING_AGG(ID2, ', ') as Groups
FROM (
select
t1.ID as ID1,
t1.C1 as t1C1,
t1.C2 as t1C2,
t2.ID as ID2,
t2.C1 as t2C1,
t2.C2 as t2C2,
t3.ID as ID3,
t3.C1 as t3C1,
t3.C2 as t3C2
from #t t1
LEFT JOIN (
Select * From #t
) t2 ON t1.C1 = t2.C1
LEFT JOIN (
Select * From #t
) t3 ON t1.C2 = t3.C2
WHERE t1.C1 = t2.C1 OR t1.C2 = t3.C2
) Groups
GROUP BY ID1
Image here: Output
I am trying to find a way to re-group these by finding an intersection between the values in the Groups column. For example its clear from the result that 1,2,3 belong together, and 4,5,6 belong together because their groups column has a common ID.
1条答案
按热度按时间kgsdhlau1#
Try this one:
https://dbfiddle.uk/-nIlBCer