SQL Server How can I create an intelligent grouping in T-SQL to group a dataset based on an ascending DATE value?

b0zn9rqh  于 11个月前  发布在  其他
关注(0)|答案(2)|浏览(157)

I have an example dataset in my SQL table, and I need to group the data in a specific way.

+-------+----------+----------------+
| ID    | SUM      |  DATE          |
+-------+----------+----------------+
|     8 |        0 |     2023-01-01 |
|     8 |        0 |     2023-01-02 |
|     8 |       10 |     2023-01-03 |
|     8 |        0 |     2023-01-04 |
|     8 |      200 |     2023-01-05 |
|     8 |      200 |     2023-01-06 |
|     8 |      200 |     2023-01-07 |
|     8 |      200 |     2023-01-08 |
|     8 |      200 |     2023-01-09 |
|   778 |      200 |     2023-10-25 |
|   778 |      200 |     2023-10-26 |
+-------+----------+----------------+

I would like to have this as the final outcome: grouping the amounts that are the same by ascending days and taking the minimum date from them. Additionally, grouping should be done based on the different IDs. As seen in the image below, you can see that the ID 8 with DATE as 2023-01-1 & 2023-01-02 are grouped together. The next record has a mutation of SUM = 10, so it is also seen as a separate row. Then, the SUM = 10 is mutated back to SUM = 0, which should also be treated as a new row.

+-------+----------+----------------+
| ID    | SUM      |  DATE          |
+-------+----------+----------------+
|     8 |        0 |     2023-01-01 |
|     8 |       10 |     2023-01-03 |
|     8 |        0 |     2023-01-04 |
|     8 |      200 |     2023-01-05 |
|   778 |      200 |     2023-10-25 |
+-------+----------+----------------+

I have tried the SQL query below, but I have realized that the row with ID = 8, SUM = 0, DATE 2023-01-04 is being grouped together with the rows from 2023-01-01 & 2023-01-02 and is missing

SELECT
      [ID],
      [SUM],
      MIN([DATE]) AS [Date]
 FROM [dbo].[test] 
 GROUP BY [ID], [SUM]
+-------+----------+----------------+
| ID    | SUM      |  DATE          |
+-------+----------+----------------+
|     8 |        0 |     2023-01-01 |
|     8 |       10 |     2023-01-03 |
|     8 |      200 |     2023-01-05 |
|   778 |      200 |     2023-10-25 |
+-------+----------+----------------+

Can someone help me come up with a clever solution that takes into account the ascending DATE column while grouping, or else create a new row with the mutation? 

ahy6op9u

ahy6op9u1#

This is a gaps and islands problem, To solve it, use the LAG() function to get the previous date for each id and sum, and then select only the entries where date_diff <> 1.

WITH cte as (
  SELECT *,
        COALESCE(DATEDIFF(DAY, 
                          LAG([date]) OVER (PARTITION BY id, sum ORDER BY [date]),
                          [date]
                          ), 0) AS date_diff
  FROM mytable
)
SELECT id, sum, [date]
FROM cte
WHERE date_diff <> 1
ORDER BY [date]

Demo here

mrzz3bfm

mrzz3bfm2#

If you just want the first row from each group you can do:

select id, sum, [date]
from (
  select t.*, lag([date]) over (partition by id, sum order by [date]) as prev_date
  from mytable t
) x
where prev_date is null
order by [date]

Result:

id   sum  date       
 ---- ---- ---------- 
 8    0    2023-01-01 
 8    10   2023-01-03 
 8    200  2023-01-05 
 778  200  2023-10-25

See running example at db<>fiddle .

相关问题