所以我有一个有趣的小循环,寻找重复的地址。我终于调整运行相当快,但我不能完全得到我需要从它。
样本数据和预期结果
循环如下:
WHILE @cnt <= @max
BEGIN
SELECT
@customer_id=[customer_id],
@add1 =[add1],
@add2 =[add2],
@add3 =[add3],
@zip =[zip]
FROM #tmpCustomers
WHERE [id] = @cnt
-- Ensures customer_id is not already in the dupeCustomerAddress table
IF NOT EXISTS( SELECT [customer_id] FROM dupeCustomerAddress WHERE [customer_id] = @customer_id )
BEGIN
INSERT INTO dupeCustomerAddress
SELECT
[customer_id],
[address1],
[add1],
[address2],
[add2],
[address3],
[add3],
[zip]
FROM [working_customerAddress]
WHERE
-- Removes the Record used for comparison
-- [customer_id] != @customer_id AND
-- Don't need to include records already processed
[customer_id] > @cnt AND
(
( -- Address Line 1 or 2 matches Comparison Line 1 AND Zip matches
(
( [address1] IS NOT NULL AND [add1] = @add1 ) OR
( [address2] IS NOT NULL AND [add2] = @add1 )
) AND [zip] = @zip
)
OR
( -- Address Line 1 or 2 matches Comparison Line 2 AND Zip matches
(
( [address1] IS NOT NULL AND [add1] = @add2 ) OR
( [address2] IS NOT NULL AND [add2] = @add2 )
) AND [zip] = @zip
)
)
GROUP BY customer_id, add1, add2, add3, address1, address2, address3, zip
HAVING COUNT(customer_id) > 1
END
SET @cnt = @cnt + 1
END
如果我移除 GROUP BY
以及 HAVING
然后它返回每一行,因为#tmpcustomers只是工作#customeraddress字段的子集。这是有道理的。
如果我取消注解 [customer_id] != @customer_id
然后它只给我重复的记录信息,而不是源记录信息。如果b和c是a的副本,我只得到b和c,但我想要a,b和c。
我试着添加 GROUP BY
以及 HAVING
但后来我一点结果都没有。所有的记录应该是唯一的,因为它只是一个 per customer_id
,所以 GROUP BY
不应该是个问题。也许是 HAVING
是不对的。
我错过了什么?
我正在运行最新版本的mssqlserver和ssms
暂无答案!
目前还没有任何答案,快来回答吧!