SSIS comes with an out-of-box SCD Wizard to handle Type 1 and Type 2 Slowly Changing Dimensions (SCD) which is a fundamental ETL requirement. However the SCD wizard component has some serious drawbacks – both from operational and functional perspectives that make it unusable for practical purposes. A good summary on the shortcomings of SCD Wizard component can be found here
Several workarounds have evolved over time and in this post I would like explore the different alternative options to handle Type 1 and Type 2 SCD without using the out-of-box SCD Wizard Component.
One of the best open source components available out there is SSIS Dimension Merge SCD Component ( formerly known as Kimball Method SSIS Slowly Changing Dimension Component) . This component can be downloaded from CodePlex.
T-SQL MERGE statement first introduced in SQL Server 2008 does a pretty good job in handling Type 1 and Type 2 SCD. Oracle also has a MERGE equivalent. A good post on usage of MERGE statement for SCD can be found in Kimball Group.
If your organization policy prevents you from using open source products and if you are using earlier versions of SQL Server (SQL Server 2005 and below) then Options 1 &2 is not for you. In that case you can use available SSIS components to handle your SCD logic as illustrated by the following examples.
Type 1 SCD can be addressed by using a simple combination of Lookup and Conditional Split components in the Dataflow. Screen Capture 1 shows a simple implementation of Type 1 SCD.
Here you can see
1. The source is a query from Product table of AdventureWorks. The query has a checksum column on the changing attributes which will be used downstream to determine whether the Dim record needs to be updated with this transaction record.
SELECT ProductNumber,ModifiedDate, Name,StandardCost, Color,ListPrice,Size, Weight, HASHBYTES('SHA1',Name+ CAST(StandardCost AS VARCHAR(20))+ ISNULL(Color,'') +CAST(ListPrice AS VARCHAR(20))+ISNULL(Size,'')+ CAST(ISNULL(Weight,0) AS VARCHAR(20))) As SRC_ChangingAttributeChecksum FROM SalesLT.Product AS src
2. The Lookup component queries the DW DimProduct table using a SQL query. Similar to the source, a checksum column is created on the changing attributes of the Dim table that corresponds to the source columns.
SELECT ProductAlternateKey, HASHBYTES('SHA1',EnglishProductName+ CAST(ISNULL(StandardCost,0) AS VARCHAR(20)) + ISNULL(Color,'') +CAST(ISNULL(ListPrice,0) AS VARCHAR(20))+ISNULL(Size,'')+ CAST(ISNULL(Weight,0) AS VARCHAR(20))) As LKUP_ChangingAttributeChecksum FROM dbo.DimProduct
3. The source records that do not match the Dim records on the unique key are new records and inserted to the Dim table.
4. For the source records that do match the Dim records, only the source records that are modified must be identified. This is done by the Conditional Split column that checks the checksum of the source record with checksum of the lookup record. If the checksums match, then it means the source and Dim record is the same and hence the source record can be discarded.
5. If the checksums differ, then the dim record gets updated by the OLEDB SQL component with the current value from source.
An approach similar to Type 1 SCD can be used to address Type 2 SCD using a combination of Lookup and Conditional Split components by making simple adjustments to handle historical attributes (in addition to changing attributes) and ageing. Screen Capture 2 shows a simple implementation of Type 2 SCD.
Here you can see
1. Both the source and lookup query contain checksums on the both the Changing AND Historical attributes.
2. The source records that do not match the Dim records on the unique key are inserted to the Dim table as before.
3. The Conditional Split component determines whether there are changes to checksum of Historical and Changing attributes of existing Dim records from the source.
4. The source records that differ on the Changing Attributes are updated by the OLEDB SQL Component.
5. The source records that differ on the Historical attributes are first aged and End Dated by the OLEDB SQL Component and then the current record is inserted as a new Dim record.
Of course this option is a middle ground between Options 1 or 2 and SCD Wizard Component, but is definitely more robust than the SCD Wizard Component and does the trick especially with small projects when you are dealing with few specialized packages.
@BennyAustin In this context: thank you for your excellent post on SSIS SCD alternatives.—
Niels Nuyttens (@nnuytten) July 25, 2013