I have an example dataset in my SQL table, and I need to group the data in a specific way.
+-------+----------+----------------+
| ID | SUM | DATE |
+-------+----------+----------------+
| 8 | 0 | 2023-01-01 |
| 8 | 0 | 2023-01-02 |
| 8 | 10 | 2023-01-03 |
| 8 | 0 | 2023-01-04 |
| 8 | 200 | 2023-01-05 |
| 8 | 200 | 2023-01-06 |
| 8 | 200 | 2023-01-07 |
| 8 | 200 | 2023-01-08 |
| 8 | 200 | 2023-01-09 |
| 778 | 200 | 2023-10-25 |
| 778 | 200 | 2023-10-26 |
+-------+----------+----------------+
I would like to have this as the final outcome: grouping the amounts that are the same by ascending days and taking the minimum date from them. Additionally, grouping should be done based on the different IDs. As seen in the image below, you can see that the ID 8 with DATE as 2023-01-1 & 2023-01-02 are grouped together. The next record has a mutation of SUM = 10, so it is also seen as a separate row. Then, the SUM = 10 is mutated back to SUM = 0, which should also be treated as a new row.
+-------+----------+----------------+
| ID | SUM | DATE |
+-------+----------+----------------+
| 8 | 0 | 2023-01-01 |
| 8 | 10 | 2023-01-03 |
| 8 | 0 | 2023-01-04 |
| 8 | 200 | 2023-01-05 |
| 778 | 200 | 2023-10-25 |
+-------+----------+----------------+
I have tried the SQL query below, but I have realized that the row with ID = 8, SUM = 0, DATE 2023-01-04 is being grouped together with the rows from 2023-01-01 & 2023-01-02 and is missing
SELECT
[ID],
[SUM],
MIN([DATE]) AS [Date]
FROM [dbo].[test]
GROUP BY [ID], [SUM]
+-------+----------+----------------+
| ID | SUM | DATE |
+-------+----------+----------------+
| 8 | 0 | 2023-01-01 |
| 8 | 10 | 2023-01-03 |
| 8 | 200 | 2023-01-05 |
| 778 | 200 | 2023-10-25 |
+-------+----------+----------------+
Can someone help me come up with a clever solution that takes into account the ascending DATE column while grouping, or else create a new row with the mutation?
2条答案
按热度按时间ahy6op9u1#
This is a gaps and islands problem, To solve it, use the
LAG()
function to get the previous date for each id and sum, and then select only the entries where date_diff <> 1.Demo here
mrzz3bfm2#
If you just want the first row from each group you can do:
Result:
See running example at db<>fiddle .