SQL Server Gather the max of a set of data by 2 columns in SQL?

pxy2qtax  于 2023-05-28  发布在  其他
关注(0)|答案(2)|浏览(109)

I am trying to get the latest of a set of columns by a PersonID out of a set of YearIDs.

If I have a table like this:
| DataID | PersonID | YearID | Data1A | Data1B | Data2A | Data2B |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 1 | 888 | A100 | d | 0.00 | a | 1.00 |
| 2 | 888 | A101 | NULL | NULL | b | 2.00 |
| 3 | 888 | A102 | c | 3.00 | NULL | NULL |
| 4 | 333 | A100 | a | 3.40 | e | 4.00 |
| 5 | 333 | A101 | d | 0.00 | NULL | NULL |
| 6 | 333 | A102 | NULL | NULL | NULL | NULL |

How do I get the latest of column sets Data1A, Data1B and Data2A, Data2B sorted by YearID per PersonID? This is given that Data1A and Data1B are related and Data2A and Data2B are related and can not be separated, and most recent year is A102. DataID is just an incremental PK column.

My resulting table should look like this, with Year being removed as it's no longer necessary. It should ignore NULLs but not 0's:
| DataID | PersonID | Data1A | Data1B | Data2A | Data2B |
| ------------ | ------------ | ------------ | ------------ | ------------ | ------------ |
| 1 | 888 | c | 3.00 | b | 2.00 |
| 2 | 333 | d | 0.00 | e | 4.00 |

This is what I have so far, but I don't know how to take into account the fact that I want the 'max'/latest of a set of Years by PersonID. Right now it gets the max of each column but I want the most recent valid data by latest year, and it also has Data1 and Data2 not being related at all but I need them to be.

SELECT DISTINCT 
    T1.SID, 
    GroupedT1.Data1,
    GroupedT1.Data2,
    FROM #Table1 T1
    INNER JOIN
        (SELECT SID, 
        MAX(Data1) AS Data1,
        MAX(Data2) AS Data2,
        FROM #Table1
        GROUP BY PersonID) GroupedT1
    ON T1.PersonID = GroupedT1.PersonID

Editing thanks to Gordon for the previous answer, this is how I tried to fix my new problem:

With this solution I'm trying to get the latest for Data1 and Data2, ignoring as many NULL columns as there is, and picking data from any YearID as long as it's the latest. So if in the year A102, Data1A is NULL then it should pick year A101's Data1A, and if Data2A is null for many years, it should pick the latest (in this case, year A100). At the moment it's close but it only picks by row, and needs to pick by year and with any number of NULL data.

select t1.PersonID, t1.Data1A, t1.Data1B, t1.Data2A, t1.Data2B
    from (select t1.*,
                 row_number() over (partition by SID order by
                 (case when Data1A is not null then 1 else 2 end),
                 (case when Data2A is not null then 1 else 2 end),
                 YearID desc) as seqnum
          from #Table1 t1
         ) t1
    where seqnum = 1
tkclm6bt

tkclm6bt1#

This answers the original question.

I think you want a simple filtering before applying logic such as row_number() :

select t1.*
from (select t1.*,
             row_number() over (partition by personid order by yearid desc) as seqnum
      from #table1 t1
      where data1 is not null and data2 is not null
     ) t1
where seqnum = 1;

EDIT:

To answer the revised question, you need to handle each columns separately. You can do this using outer apply :

select p.personid, d1.data1, d2.data2, . . .
from (select distinct personid from #table1) p outer apply
     (select top (1) t1.data1
      from #table1 t1
      where t1.personid = p.personid and t1.data1 is not null
      order by t1.yearid desc
     ) d1 outer apply
     (select top (1) t1.data2
      from #table1 t1
      where t1.personid = p.personid and t1.data2 is not null
      order by t1.yearid desc
     ) d2 . . .
5rgfhyps

5rgfhyps2#

You can use the not exists keywork

SELECT DataID, PersonID, Data1, Data2
    FROM #Table1 T1
where not exists(select 1 from #Table1 T2 
                 where T1.DataID = T2.DataID and T2.YearID > T1.YearID)

相关问题