We have a table containing the current state and a history table tracking all changes to the state. Both are normal tables and records in the history table are inserted based on a trigger on the current table. Only the current table is modified.
The history table contains all data. So the latest record for specific id in the history table has the same information as the record in the current table.
We want to query the time when the value of the "state" column changed to the current state (if the current state is not null). The history table keeps records of changes not just to the "state" column, but also to other state columns.
Here are the tables:
Current state table:
| id | state | other_state |
| ------------ | ------------ | ------------ |
| 1 | green | aaa |
| 2 | red | bbb |
| 3 | null | aaa |
History table:
history_id | id | state | other_state | valid_from | valid_to |
---|---|---|---|---|---|
8 | 1 | green | aaa | 2023-06-18 | 9999-12-31 |
3 | 1 | null | aaa | 2023-06-01 | 2023-06-18 |
7 | 2 | red | aaa | 2023-06-16 | 9999-12-31 |
6 | 2 | red | bbb | 2023-06-10 | 2023-06-16 |
2 | 2 | null | aaa | 2023-06-01 | 2023-06-10 |
5 | 3 | null | aaa | 2023-06-14 | 9999-12-31 |
4 | 3 | blue | aaa | 2023-06-10 | 2023-06-14 |
1 | 3 | null | aaa | 2023-06-01 | 2023-06-10 |
We want the output to look like this:
id | state | state_valid_since | other_state |
---|---|---|---|
1 | green | 2023-06-18 | aaa |
2 | red | 2023-06-10 | bbb |
3 | null | null | aaa |
Since these history tables can be rather large, we are looking for an efficient way to query this. Our first attempts were with a calculating row number partitioned over id and state to find the changed values, but we never got to the correct result. We would appreciate any hints!
Here is a minimal example (but the query is not yet giving the expected results)
Declare @current_table as table(
id int,
state varchar(10),
other_state varchar(10))
INSERT INTO @current_table
VALUES
(1, 'green' ,'aaa'),
(2, 'red','aaa'),
(3, null,'aaa')
Declare @history_table as table(
history_id int,
id int,
state varchar(10),
other_state varchar(10) ,
valid_from date,
valid_to date)
INSERT INTO @history_table
VALUES
(8, 1, 'green' ,'aaa', '2023-06-18' , '9999-12-31'),
(3, 1, null,'aaa', '2023-06-01', '2023-06-18'),
(7, 2, 'red','aaa', '2023-06-16', '9999-12-31'),
(6, 2, 'red','bbb', '2023-06-10', '2023-06-16'),
(2, 2, null,'aaa', '2023-06-01', '2023-06-10'),
(5, 3, null,'aaa', '2023-06-14', '9999-12-31'),
(4, 3, 'blue','aaa', '2023-06-10', '2023-06-14'),
(1, 3, null,'aaa', '2023-06-01', '2023-06-10');
WITH changes
AS (SELECT Row_number() OVER ( PARTITION BY curr.id, curr.state ORDER BY curr.valid_from DESC) AS rn,
curr.id,
curr.state,
curr.valid_from,
curr.valid_to,
prev.state prev_state,
prev.valid_to prev_valid_to
FROM
@history_table curr
LEFT JOIN @history_table prev
ON curr.id = prev.id AND curr.valid_from > prev.valid_from
)
SELECT * from changes
where rn = 1
order by id, valid_from desc
3条答案
按热度按时间bwleehnv1#
It's not clear from your question if your history table is a normal table that you maintain or is a system-versioned temporal table - if not then perhaps it should be where you can use
FOR SYSTEM_TIME
to query it?To get the desired result above you can simply use a correlated subquery:
For performance you'd want the
history_table
to have an index onid, valid_from
.If you did want to also get the values of other columns you would instead implement in an
apply
- although it's not really clear from your data or your description if the newest row in thehistory_table
is the previous row of thecurrent_table
or not (it should be).Edit
With some more info about your data I think a different approach is called for. I still think you should be implementing system-versioning here as having a history table that also contains the current row makes things harder.
Another approach would be to use the earliest valid_from date from the most recent block of matching states. A view on the history table can provide a grouped sequence, then your query can make use of the view instead:
2eafrhcq2#
data
you should use
aggregate function
max function insidewindow function
Row_Number inCTE
and to distinguish your values and use JOIN missing NULL valuesdbfiddle
wgxvkvu93#
I adapted the solution of @stu and @MatBailie with a groupby approach:
dbfiddle
This approach does not consider the case that an entry in the current table is inserted, deleted, and inserted again, so that the valid_to of one column is not the same value as the valid_from row of the next row in the history table.
However, on our table with 5k ids and history table with approx. 200k entries, this query is significantly faster.