Creating a single ID for any combination of matches between three different identifiers

ttcibm8c 于 2023-02-28 发布在其他

关注(0)|答案(1)|浏览(91)

I have customer data coming from several different systems, with three potential unique identifiers. We need to create a single new ID that can be used to link ANY match between the three identifiers.

The data is in the following structure:
| Primary Key | CustomerID1 | CustomerID2 | CustomerID3 |
| ------------ | ------------ | ------------ | ------------ |
| 1 | Alpha | Dog | Jeans |
| 2 | Alpha | Cat | Shirt |
| 3 | Beta | Dog | Dress |
| 4 | Gamma | Bear | Jeans |
| 5 | Alpha | Dog | Jeans |
| 6 | Epsilon | Bird | Boots |

In the end, keys 1-5 should all be linked based on the bolded matches while 6 should have its own ID. So the desired output would be a new ID generated for just 2 customers from this list and then a second table with all the IDs & IDTypes linked to that new ID (but open to changing the end result structure so long as we generate 1 ID for each customer):

New ID
Key1
Key2

New ID	ID	IDType
Key1	Alpha	CustomerID1
Key1	Beta	CustomerID1
Key1	Gamma	CustomerID1
Key1	Dog	CustomerID2
Key1	Cat	CustomerID2
Key1	Bear	CustomerID2
Key1	Jeans	CustomerID3
Key1	Shirt	CustomerID3
Key1	Dress	CustomerID3
Key2	Epsilon	CustomerID1
Key2	Bird	CustomerID2
Key2	Boots	CustomerID3

I've tried creating a list of all possible matches like the below but I'm not sure how to consolidate that down into a single ID for each customer.

DROP TABLE IF EXISTS #Test
CREATE TABLE #Test (PrimaryKey int, CustomerID1 varchar(15), CustomerID2 varchar(15), CustomerID3 varchar(15))
INSERT INTO #Test VALUES
     (1,'Alpha','Dog','Jeans')
    ,(2,'Alpha','Cat','Shirt')
    ,(3,'Beta','Dog','Dress')
    ,(4,'Gamma','Bear','Jeans')
    ,(5,'Alpha','Dog','Jeans')
    ,(6,'Epsilon','Bird','Boots')

SELECT
     t1.PrimaryKey
    ,t2.Primarykey
FROM #Test t1
JOIN #Test t2 on t2.PrimaryKey != t1.PrimaryKey and t1.CustomerID1 = t2.CustomerID1
UNION
SELECT
     t1.PrimaryKey
    ,t2.Primarykey
FROM #Test t1
JOIN #Test t2 on t2.PrimaryKey != t1.PrimaryKey and t1.CustomerID2 = t2.CustomerID2
UNION
SELECT
     t1.PrimaryKey
    ,t2.Primarykey
FROM #Test t1
JOIN #Test t2 on t2.PrimaryKey != t1.PrimaryKey and t1.CustomerID3 = t2.CustomerID3

I feel like the solution is obvious but I'm stuck so any help is appreciated! Thank you!

sql-server

来源：https://stackoverflow.com/questions/75559657/creating-a-single-id-for-any-combination-of-matches-between-three-different-iden

1条答案

按热度按时间

vfh0ocws1#

It's a bit tricky to do in one select (at least for me). I usually do something like this:

SELECT  *, CAST(NULL AS INT) AS ID_To
INTO #t
FROM
(
    VALUES  (1, N'Alpha', N'Dog', N'Jeans')
    ,   (2, N'Alpha', N'Cat', N'Shirt')
    ,   (3, N'Beta', N'Dog', N'Dress')
    ,   (4, N'Gamma', N'Bear', N'Jeans')
    ,   (5, N'Alpha', N'Dog', N'Jeans')
    ,   (5, N'Alpha', N'Bonanza', N'Boots')
    ,   (6, N'Epsilon', N'Bird', N'Boots')
    ,   (7, N'zz', N'dog', N'Bird')
    ,   (8, N'zzz', N'bye', N'hi')
    ,   (9, N'zzzz', N'bear', N'hi ho silver')
) t (ID,CustomerID1,CustomerID2,CustomerID3)

WHILE @@rowcount > 0
BEGIN
    UPDATE  t2
    SET ID_to = ISNULL(t.ID_To, t.ID)
    FROM    #t t
    LEFT JOIN #t tTo
        ON  tTo.ID = t.ID_To
    CROSS APPLY (
        VALUES (t.Customerid1), (t.Customerid2), (t.Customerid3)
        ) v(externalId) 
    CROSS JOIN #t t2
    CROSS APPLY (
        VALUES (t2.Customerid1), (t2.Customerid2), (t2.Customerid3)
        ) v2(externalId) 
    WHERE   ISNULL(t.id_To, t.id) < ISNULL(t2.ID_to, t2.id)
    AND v.externalId = v2.externalId
    AND t.ID <> t2.ID
END

SELECT  *
FROM    #t

SELECT  ISNULL(ID_TO, ID) AS groups
FROM    #t
GROUP BY ISNULL(ID_TO, ID)

To explain the code:

I create two columns in your table containing row ID (ID) and eventual match ID (ID_to). This is so we can consolidate how every customer is matched against other customers
Then i do a loop. In the loop i try to match the keys between two different customers. What i want to do is to match every customer against another customerID1,2,3 (AND v.externalId = v2.externalId )
To make sure the loop finish i only want to match if the new id is lower than current one ISNULL(t.id_To, t.id) < ISNULL(t2.ID_to, t2.id). If the matching customer is already matched to someone else i take his ID_to. This ensure that if there's a long chain of matching IDs, we will follow the chain all the way.
The WHILE @@ROWCOUNT > 0 trick is quite useful to loop stuff until there's something to do. But it's important to actually have a breaking condition otherwise the loop runs forever.

You can put the select inside the while loop to track what's going on with the #t-table

When the loop is done, every customer should either has ID_to = NULL, meaning he is the "master", or ID_to > 0 which means this customer is matched against another master customer

赞(0）回复(0）举报 2023-02-28

我来回答

Creating a single ID for any combination of matches between three different identifiers

1条答案

相关问题

热门标签

最新问答