I am given the following table with the following problem:
Create a Slowly Changing Dimension Type 2 from the dataset. EMPLOYEE table has daily records for each employee. Type 2 - Will have effective data and expire date.
| Employee ID | Date | Name | Manager ID |
| ------------ | ------------ | ------------ | ------------ |
| 123 | 1-Mar | John Smith | 1 |
| 123 | 2-Mar | John Smith | 1 |
| 123 | 3-Mar | John Smith | 2 |
| 123 | 4-Mar | John Smith | 3 |
| 123 | 5-Mar | John Smith | 3 |
I believe my target table is supposed to look like this:
Employee ID | Name | Manager ID | Effective Date | Expiration Date |
---|---|---|---|---|
123 | John Smith | 1 | 1-Mar | 3-Mar |
123 | John Smith | 2 | 3-Mar | 4-Mar |
123 | John Smith | 3 | 4-Mar | Null |
I attempted the following query:
SELECT employee_id, name, manager_id,
CASE
WHEN LAG(manager_id) OVER() != manager_id THEN e.date
WHEN e.date = FIRST_VALUE(e.date) OVER() THEN e.date
ELSE NULL
END as "Effective Date",
CASE
WHEN LEAD(manager_id) OVER() != manager_id THEN LEAD(e.date) OVER()
ELSE NULL
END as "Expiration Date"
FROM employee e
My resulting table is as follows:
Employee ID | Name | Manager ID | Effective Date | Expiration Date |
---|---|---|---|---|
123 | John Smith | 1 | 1-Mar | Null |
123 | John Smith | 1 | Null | 3-Mar |
123 | John Smith | 2 | 3-Mar | 4-Mar |
123 | John Smith | 3 | 4-Mar | Null |
123 | John Smith | 3 | Null | Null |
Does anyone know of any way that I can alter my query to achieve my target table, based on what I've achieved thus far? I somehow need to only result in the 3 Manager ID's but distinct will not work. Also, I need to find a way to combine the effective date and expiration date for each manager ID. Any help at all would be greatly appreciated.
6条答案
按热度按时间nszi6y051#
The following does what you require, and shows how to add DDL+DML as well. Its probably a bit convoluted but I can't see an obvious way to simplify it.
This solution takes into account the possibility that the manager could repeat. And it doesn't assume that ever day will exist, so if a day is missing it will still work.
Returns:
vaqhlq812#
Use this, this will be simpler.
Explanation: The nested query will give the rows where there is change in managers, filter out rest of the rows as it is redundant information.
Once the data is filtered, find the next date when the manager got changed, mark that data as end date
rdlzhqv93#
rggaifut4#
5cnsuln75#
Set up schema
Use the following query to setup an employees table in sql server (or RDBMS of your choice with some syntax mods)
Walkthrough of Solution
Step 1: Ascertain the Start Dates
Transform the Employees table to show the start date for each manager an employee has worked under.
This returns:
Step 2: Adding end_date
Use this table to calculate a column that displays the end date for the given employee - manager pair. Using the concept that the start_date of an employee working with one manager, is the end date of the employees tenure with their previous manager.
Use the LEAD function and create partitions over Employee, making sure to order by start_date to create a column that represents the end_date of an employee - manager pair.
See below how we use the first query to generate a CTE called t1, then build the end_date column in a query carried out against t1.
Final rowset matches desired ouput:
Further Test
To be safe, test the case where there is more than one employee in input dataset.
The output shows our solution passed the test:
6rvt4ljy6#