SQL Server SQL to Build Type 2 Dimension

ndh0cuux  于 2023-04-28  发布在  其他
关注(0)|答案(2)|浏览(129)

I need to build a type 2 dimension table to store price changes for various products. In the source data, there are two tables I can pull from. One has the current price of each product, and one has the price change history for each product. Prices for some products change more than others, and if a product's price has never changed, it won't have a record in the price change table at all.

Given the following current price table:

PRODUCT_ID  CURRENT_PRICE
----------  -------------
  ABC123        250
  DEF456        200
  GHI789        325

And product price history table:

PRODUCT_ID  OLD_PRICE   NEW_PRICE   CHANGE_DATE
----------  ---------   ---------   -----------
  ABC123        275       250        1/1/2016
  DEF456        250       225        6/1/2015
  DEF456        225       200        1/1/2016

What SQL could I run to populate a type 2 dimension as follows:

PRODUCT_ID  PRODUCT_PRICE   VALID_FROM  VALID_TO    CURRENT_PRICE_INDICATOR
----------  -------------   ----------  --------    ----------------------
  ABC123        275          1/1/1900   12/31/2015      N       
  ABC123        250          1/1/2016   12/31/9999      Y
  DEF456        250          1/1/1900   5/31/2015       N
  DEF456        225          6/1/2015   12/31/2015      N
  DEF456        200          1/1/2016   12/31/9999      Y
  GHI789        325          1/1/1900   12/31/9999      Y
lx0bsm1f

lx0bsm1f1#

I think it is something like this:

DECLARE @price TABLE(PRODUCT_ID VARCHAR(100),CURRENT_PRICE DECIMAL(8,4));
INSERT INTO @price VALUES
 ('ABC123',250)
,('DEF456',200)
,('GHI789',325);

DECLARE @priceHist TABLE(PRODUCT_ID VARCHAR(100),OLD_PRICE DECIMAL(8,4),NEW_PRICE DECIMAL(8,4),CHANGE_DATE DATE);
INSERT INTO @priceHist VALUES
 ('ABC123',275,250,{d'2016-01-01'})
,('DEF456',250,225,{d'2015-06-01'})
,('DEF456',225,200,{d'2016-01-01'});

WITH AllData AS
(
    SELECT ROW_NUMBER() OVER(PARTITION BY Combined.PRODUCT_ID ORDER BY ISNULL(Combined.CHANGE_DATE,{d'9999-12-31'}) ASC) AS Inx
          ,*
          ,CASE WHEN CHANGE_DATE IS NULL THEN 'Y' ELSE 'N' END AS CURRENT_PRICE_INDICATOR
    FROM
    (
        SELECT p.PRODUCT_ID AS PRODUCT_ID
              ,p.CURRENT_PRICE AS PRODUCT_PRICE
              ,NULL AS CHANGE_DATE
        FROM @price AS p
        UNION ALL
        SELECT ph.PRODUCT_ID
              ,ph.OLD_PRICE
              ,ph.CHANGE_DATE
        FROM @priceHist AS ph
    ) AS Combined
)
SELECT ad.PRODUCT_ID
      ,ad.PRODUCT_PRICE
      --Version with LAG (SQL Server 2012 and higher)
      --,CASE WHEN ad.Inx=1 THEN {d'1900-01-01'} ELSE LAG(ad.CHANGE_DATE,1) OVER(PARTITION BY ad.PRODUCT_ID ORDER BY ISNULL(ad.CHANGE_DATE,{d'9999-12-31'}) ASC) END AS VALID_FROM
      ,CASE WHEN ad.Inx=1 THEN {d'1900-01-01'} ELSE LAG_Replace_For_SQLServer2008.CHANGE_DATE END AS VALID_FROM
      ,CASE WHEN ad.CURRENT_PRICE_INDICATOR='Y' THEN {d'9999-12-31'} ELSE DATEADD(DAY,-1,ad.CHANGE_DATE) END AS VALID_TO
      ,ad.CURRENT_PRICE_INDICATOR
FROM AllData AS ad
OUTER APPLY
(
    SELECT x.CHANGE_DATE 
    FROM AllData AS x
    WHERE x.PRODUCT_ID=ad.PRODUCT_ID
      AND x.Inx=ad.Inx-1
) LAG_Replace_For_SQLServer2008

The result:

ABC123  275.0000    1900-01-01  2015-12-31  N
ABC123  250.0000    2016-01-01  9999-12-31  Y
DEF456  250.0000    1900-01-01  2015-05-31  N
DEF456  225.0000    2015-06-01  2015-12-31  N
DEF456  200.0000    2016-01-01  9999-12-31  Y
GHI789  325.0000    1900-01-01  9999-12-31  Y
6vl6ewon

6vl6ewon2#

Your end-state is a typical type 2 Slowly Changing Dimension. The Price history table is a little bit of a red herring here in that the NEW_PRICE should be ignored. Simply write that data in an initial load into your dimension table with something like:

CREATE TABLE Dim_Price (
  Price_Key INT IDENTITY,
  Product_ID NVARCHAR(10) NOT NULL,
  Price INT NOT NULL,
  Row_Effective_Date DATETIME NOT NULL,
  Row_Expiry_Date DATETIME NOT NULL,
  Row_Current_Flag INT NOT NULL)

 INSERT INTO Dim_Price VALUES
 ('ABC123',275,'1 Jan 1900','31 Dec 9999',1),
 ('DEF456',250,'1 Jan 1900','31 Dec 9999',1)

From then on you can use something akin to the merge statement below (Can't validate syntax at this moment) to merge from the source table into the destination table.

More information on slowly changing dimensions and how to handle them can be found on the Kimball Group website . After all, Ralph Kimball did invent them :)

INSERT INTO Dim_Price
SELECT
  Product_ID
 ,Price
 ,Row_Effective_Date
 ,Row_Expiry_Date
 ,Row_Current_Flag
FROM (
MERGE Dim_Price TGT
USING STG_Price SRC ON SRC.Product_ID = TGT.Product_ID
WHEN NOT MATCHED THEN 
INSERT VALUES(
  SRC.Product_ID
 ,SRC.Price
 ,'1 Jan 1900'
 ,'31 Dec 9999'
 ,1)
 WHEN MATCHED AND TGT.Row_Current_Flag = 1 AND EXITS(
   SELECT SRC.Price
   EXCEPT
   SELECT TGT.Price)
 THEN UPDATE SET TGT.Row_Current_Flag = 0
                ,TGT.Row_Expiry_Date = DATEADD(SECOND,86399,DATEADD(DAY,-1,SECOND,CAST(GETDATE() AS DATE)))
 OUTPUT $action AS Action
        ,SRC.Product_ID
        ,SRC.Price
        ,GETDATE()
        ,'31 Dec 9999'
        ,1
) AS MERGE_OUT
WHERE MERGE_OUT.Action_Out = 'UPDATE';

相关问题