I have customer data coming from several different systems, with three potential unique identifiers. We need to create a single new ID that can be used to link ANY match between the three identifiers.
The data is in the following structure:
| Primary Key | CustomerID1 | CustomerID2 | CustomerID3 |
| ------------ | ------------ | ------------ | ------------ |
| 1 | Alpha | Dog | Jeans |
| 2 | Alpha | Cat | Shirt |
| 3 | Beta | Dog | Dress |
| 4 | Gamma | Bear | Jeans |
| 5 | Alpha | Dog | Jeans |
| 6 | Epsilon | Bird | Boots |
In the end, keys 1-5 should all be linked based on the bolded matches while 6 should have its own ID. So the desired output would be a new ID generated for just 2 customers from this list and then a second table with all the IDs & IDTypes linked to that new ID (but open to changing the end result structure so long as we generate 1 ID for each customer):
New ID |
---|
Key1 |
Key2 |
New ID | ID | IDType |
---|---|---|
Key1 | Alpha | CustomerID1 |
Key1 | Beta | CustomerID1 |
Key1 | Gamma | CustomerID1 |
Key1 | Dog | CustomerID2 |
Key1 | Cat | CustomerID2 |
Key1 | Bear | CustomerID2 |
Key1 | Jeans | CustomerID3 |
Key1 | Shirt | CustomerID3 |
Key1 | Dress | CustomerID3 |
Key2 | Epsilon | CustomerID1 |
Key2 | Bird | CustomerID2 |
Key2 | Boots | CustomerID3 |
I've tried creating a list of all possible matches like the below but I'm not sure how to consolidate that down into a single ID for each customer.
DROP TABLE IF EXISTS #Test
CREATE TABLE #Test (PrimaryKey int, CustomerID1 varchar(15), CustomerID2 varchar(15), CustomerID3 varchar(15))
INSERT INTO #Test VALUES
(1,'Alpha','Dog','Jeans')
,(2,'Alpha','Cat','Shirt')
,(3,'Beta','Dog','Dress')
,(4,'Gamma','Bear','Jeans')
,(5,'Alpha','Dog','Jeans')
,(6,'Epsilon','Bird','Boots')
SELECT
t1.PrimaryKey
,t2.Primarykey
FROM #Test t1
JOIN #Test t2 on t2.PrimaryKey != t1.PrimaryKey and t1.CustomerID1 = t2.CustomerID1
UNION
SELECT
t1.PrimaryKey
,t2.Primarykey
FROM #Test t1
JOIN #Test t2 on t2.PrimaryKey != t1.PrimaryKey and t1.CustomerID2 = t2.CustomerID2
UNION
SELECT
t1.PrimaryKey
,t2.Primarykey
FROM #Test t1
JOIN #Test t2 on t2.PrimaryKey != t1.PrimaryKey and t1.CustomerID3 = t2.CustomerID3
I feel like the solution is obvious but I'm stuck so any help is appreciated! Thank you!
1条答案
按热度按时间vfh0ocws1#
It's a bit tricky to do in one select (at least for me). I usually do something like this:
To explain the code:
You can put the select inside the while loop to track what's going on with the #t-table
When the loop is done, every customer should either has ID_to = NULL, meaning he is the "master", or ID_to > 0 which means this customer is matched against another master customer