I have longitudinal data that is stored in different tables for each year and want to verify if an event
'T' is already present or has newly occured and derive the eventdate
Example data
CREATE TABLE Table_2010 (
ID INT,
EventDate DATE,
Event CHAR(1)
);
CREATE TABLE Table_2011 (
ID INT,
EventDate DATE,
Event CHAR(1)
);
CREATE TABLE Table_2012 (
ID INT,
EventDate DATE,
Event CHAR(1)
);
CREATE TABLE Table_2013 (
ID INT,
EventDate DATE,
Event CHAR(1)
);
CREATE TABLE Table_2014 (
ID INT,
EventDate DATE,
Event CHAR(1)
);
INSERT INTO Table_2010 (ID, EventDate, Event) VALUES
(1, '2010-01-01', 'U'),
(1, '2010-02-01', 'U'),
(2, '2010-01-15', 'T'),
(2, '2010-02-15', 'V');
INSERT INTO Table_2011 (ID, EventDate, Event) VALUES
(1, '2011-01-01', 'T'),
(1, '2011-02-01', 'V'),
(2, '2011-01-15', 'X'),
(2, '2011-02-15', 'Z'),
(2, '2011-03-01', 'T'),
(3, '2011-02-20', 'T'),
(3, '2011-03-30', 'Z');
INSERT INTO Table_2012 (ID, EventDate, Event) VALUES
(1, '2012-01-01', 'U'),
(1, '2012-02-01', 'T'),
(2, '2012-01-15', 'T'),
(2, '2012-02-15', 'Z'),
(2, '2012-03-01', 'Z');
INSERT INTO Table_2013 (ID, EventDate, Event) VALUES
(1, '2013-01-01', 'T'),
(1, '2013-02-01', 'Z'),
(2, '2013-01-15', 'T'),
(2, '2013-02-15', 'Y');
INSERT INTO Table_2014 (ID, EventDate, Event) VALUES
(1, '2014-01-01', 'Z'),
(1, '2014-02-01', 'T'),
(2, '2014-01-15', 'T'),
(2, '2014-02-15', 'X'),
(2, '2014-03-01', 'Z');
So my go-to approach would be as follows:
SELECT ID,
MIN(CASE WHEN Event = 'T' THEN EventDate END) AS T_StartDate
FROM (
SELECT ID, EventDate, Event
FROM Table_2010
WHERE Event IN ('T')
UNION ALL
SELECT ID, EventDate, Event
FROM Table_2011
WHERE Event IN ('T')
UNION ALL
SELECT ID, EventDate, Event
FROM Table_2012
WHERE Event IN ('T')
UNION ALL
SELECT ID, EventDate, Event
FROM Table_2013
WHERE Event IN ('T')
UNION ALL
SELECT ID, EventDate, Event
FROM Table_2014
WHERE Event IN ('T')
) AS AllEvents
GROUP BY ID;
The problem is that i do not know what happened before the first entry of an ID
.
Consider ID
1 and 2 which are present in Table_2010
:
ID
1 does not have the event
in 2010 but in 2011. Because the event
did not happen before 2011 i would classify this event as newly occured whereas ID
2 already had the event in 2010 and needs to be considered as already present.
In my opinion i need to verify if there is an event-free period before the first occurence.
How can i implement that?
Expected output (dummy coded 1: Yes; 0: No)
| ID | Eventdate | Newly occured |
| ------------ | ------------ | ------------ |
| 1 | 2011-01-01 | 1 |
| 2 | 2010-01-15 | 0 |
| 3 | 2011-02-20 | 0 |
2条答案
按热度按时间c9qzyr3d1#
This might still need some work on, as I still don't understand how later years affect math, but based on explanations so far, maybe something in this direction can work?
I added one mid step to better explain:
Fiddle with and without 3-X
8gsdolmq2#
The With clause is in this case needed as i also wanted to have the minimum ID, to determine if it is the evenet
But we don't recommend having multiple tables of identical design, as long there are not really good causes for that.
fiddle
For your new requirement you need to add also AllEvents
fiddle