SQL Server How do I find the distinct values from 2 tables base on ID and indicator?

nhn9ugyo  于 2023-03-28  发布在  其他
关注(0)|答案(4)|浏览(150)

I want to find the record which has the same ID and indicator but with different value. Here is my sample data.

declare @table1 table (id int, indicator varchar(20),value int);
INSERT INTO @table1 VALUES
(11,'AC',80),
(11,'HE',90),
(12,'AC',10),
(12,'HE',80),
(13,'AC',10),
(13,'HE',10);

declare @table2 table(id int, indicator varchar(20),value int);
INSERT INTO @table2 VALUES
(11,'AC',80),
(11,'HE',90),
(12,'AC',11),
(12,'HE',80),
(13,'AC',10),
(14,'AC',10);

I create db<>fiddle

Base on the sample data above, there will 4 scenarios:

  1. 11 existing in both table, each ID with same indicator and value, so I will NOT log in record.

  2. 12 with indicator 'AC' has different value from table2 (10 and 11), this is the record I need.

  3. 13 existing in both table, but table 1 has 2 records, and both indicator 'AC' has same value. In this case, I only need the one which does not have match record in table 2.(In this case, its (13,'HE',10). Please notes, there could have same situation in table 2 as well. therefore, Result for table 1 will become NULL and list the result from table 2

  4. ID only existing in one of the table, for eg: 14. Then I don't need this one.

In another word, I need the record If 2 table has same ID and Indicator but with different value. If both table has same ID, but with different indicator and value, I also need to log this record.

Therefore, I do want my result to look like this.
| Table 1 ID | Table 1 Indicator | Table 1 Value | Table 2 ID | Table 2 Indicator | Table 2 Value |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 12 | AC | 10 | 12 | AC | 11 |
| 13 | HE | 10 | NULL | NULL | NULL |

2ul0zpep

2ul0zpep1#

SELECT 
    L.id, L.indicator, L.[value], 
    R.id, R.indicator, R.[value] 
FROM 
(
    -- Rows from T1 with an id match in T2
    SELECT T1.*
    FROM @table1 AS T1
    JOIN @table2 AS T2
        ON T2.id = T1.id
    -- ...but not an exact match on all attributes
    EXCEPT
    SELECT T2.*
    FROM @table2 AS T2
) AS L
FULL JOIN 
(
    -- Rows from T2 with an id match in T1
    SELECT T2.*
    FROM @table2 AS T2
    JOIN @table1 AS T1 
        ON T1.id = T2.id
    -- ...but not an exact match on all attributes
    EXCEPT
    SELECT T1.*
    FROM @table1 AS T1
) AS R
    ON R.id = L.id
    AND R.indicator = L.indicator;
idindicatorvalueidindicatorvalue
12AC1012AC11
13HE10NULLNULLNULL

Or without a join:

SELECT 
    -- Pivot
    id1 = MAX(CASE WHEN Q1.src = 1 THEN Q1.id END),
    indicator1 = MAX(CASE WHEN Q1.src = 1 THEN Q1.indicator END),
    value1 = MAX(CASE WHEN Q1.src = 1 THEN Q1.[value] END),
    id2 = MAX(CASE WHEN Q1.src = 2 THEN Q1.id END),
    indicator2 = MAX(CASE WHEN Q1.src = 2 THEN Q1.indicator END),
    value2 = MAX(CASE WHEN Q1.src = 2 THEN Q1.[value] END)
FROM 
(
    -- Group identical source rows
    -- and add useful counts
    SELECT 
        U.id, 
        U.indicator, 
        U.[value], 
        src = MAX(U.src), 
        dupes = COUNT_BIG(*), 
        id_count = COUNT_BIG(*) OVER (
            PARTITION BY U.id
            ROWS BETWEEN UNBOUNDED PRECEDING
                AND UNBOUNDED FOLLOWING) 
    FROM 
    (
        -- Combine rows remembering where they came from
        SELECT src = 1, * FROM @table1
        UNION ALL
        SELECT src = 2, * FROM @table2
    ) AS U
    GROUP BY 
        U.id, 
        U.indicator, 
        U.[value]
) AS Q1
WHERE
    -- Only records without a match
    Q1.dupes = 1
    -- Only where the id appears in both table sources
    AND Q1.id_count > 1
GROUP BY 
    Q1.id, 
    Q1.indicator
ORDER BY
    Q1.id,
    Q1.indicator;
id1indicator1value1id2indicator2value2
12AC1012AC11
13HE10NULLNULLNULL

db<>fiddle for both.

x33g5p2x

x33g5p2x2#

This is a variation on the XOR join: you need to FULL JOIN the two sides, but only take the results if there is a difference. The tweak here is that if the ID does not exist on one side then you also want to ignore those rows (similar to an INNER JOIN ).

The best I can come up with is to use a FULL JOIN combined wth conditional windowed counts of both sides

SELECT *
FROM (
    SELECT
      Table1ID = t1.id,
      Table1Indicator = t1.indicator,
      Table1Value = t1.value,
      Table2ID = t2.id,
      Table2Indicator = t2.indicator,
      Table2Value = t2.value,
      count1 = CASE WHEN t1.id IS NOT NULL THEN COUNT(t2.id) OVER (PARTITION BY t1.id) END,
      count2 = CASE WHEN t2.id IS NOT NULL THEN COUNT(t1.id) OVER (PARTITION BY t2.id) END
    FROM @table1 t1
    FULL JOIN @table2 t2
      ON t2.id = t1.id
     AND t2.indicator = t1.indicator
) t
WHERE (Table1Value <> Table2Value)
   OR (Table1Value IS NULL AND count2 > 0)
   OR (Table2Value IS NULL AND count1 > 0);

db<>fiddle

Note that the fiddle also works for your extra case of (13,'LE',10) in table 2.

The logic here is as follows:

  • First full-join everything together, this will get us all rows from both sides, but matched by id and indicator .

  • Calculate window counts based on: group by id1 and count the number of matches for id2 , and the same in reverse. Do not calculate the count for any null group.

  • Then filter as follows:

  • If the two value s are different (but neither are null) then we have a match, so include.

  • If one side is null but there exist that id from the other side then include.

  • And the same the other way

w6lpcovy

w6lpcovy3#

I have tried to add more records to @table1 and @table2 as you said:

declare @table1 table (id int, indicator varchar(20),value int);
INSERT INTO @table1 VALUES
(11,'AC',80),
(11,'HE',90),
(12,'AC',10),
(12,'HE',80),
(13,'AC',10),
(13,'HE',10),
(null,null,null);

declare @table2 table(id int, indicator varchar(20),value int);
INSERT INTO @table2 VALUES
(11,'AC',80),
(11,'HE',90),
(12,'AC',11),
(12,'HE',80),
(13,'AC',10),
(14,'AC',10),
(13,'LE',10);

Please check if this fulfills your requirements.

SELECT COALESCE(t1.id, t2.id) AS [ID table1],
       COALESCE(t1.indicator, t2.indicator) AS [Indicator table1],
       t1.[value] AS [Value table1],
       t2.[value] AS [Value table2],
       t2.id AS [ID table2],
       t2.indicator AS [Indicator table2]
FROM @table1 t1
FULL JOIN @table2 t2
    ON t1.id = t2.id AND t1.indicator = t2.indicator
WHERE t1.[value] IS NULL OR t2.[value] IS NULL OR t1.[value] <> t2.[value]
ORDER BY [ID table1], [Indicator table1]
eufgjt7s

eufgjt7s4#

SELECT t1.id, t1.indicator,t1.value,t2.id,t2.indicator,t2.value
FROM @table1 t1
FULL JOIN @table2 t2 ON t1.id = t2.id AND t1.indicator = t2.indicator
WHERE t1.value <> t2.value 
  OR t1.id IS NULL AND t2.id IN (select id from @table1)
  OR t2.id IS NULL AND t1.id IN (select id from @table2)

相关问题