Benny Austin

Alternatives to SSIS SCD Wizard Component

SSIS comes with an out-of-box SCD Wizard to handle Type 1 and Type 2 Slowly Changing Dimensions (SCD) which is a fundamental ETL requirement. However the SCD wizard component has some serious drawbacks – both from operational and functional perspectives that make it unusable for practical purposes. A good summary on the shortcomings of SCD Wizard component can be found here

Several workarounds have evolved over time and in this post I would like explore the different alternative options to handle Type 1 and Type 2 SCD without using the out-of-box SCD Wizard Component.  

Option 1: Open Source Components

One of the best open source components available out there is SSIS Dimension Merge SCD Component ( formerly known as Kimball Method SSIS Slowly Changing Dimension Component) . This component can be downloaded from CodePlex.

Option 2: SQL Server MERGE Statement

T-SQL MERGE statement first introduced in SQL Server 2008 does a pretty good job in handling Type 1 and Type 2 SCD. Oracle also has a MERGE equivalent. A good post on usage of MERGE statement for SCD can be found in Kimball Group.

Option 3: Use Available SSIS Components

If your organization policy prevents you from using open source products and if you are using earlier versions of SQL Server (SQL Server 2005 and below) then Options 1 &2 is not for you. In that case you can use available SSIS components to handle your SCD logic as illustrated by the following examples.

Type 1 SCD

Type 1 SCD can be addressed by using a simple combination of Lookup and Conditional Split components in the Dataflow. Screen Capture 1 shows a simple implementation of Type 1 SCD.

Here you can see

1. The source is a query from Product table of AdventureWorks. The query has a checksum column on the changing attributes which will be used downstream to determine whether the Dim record needs to be updated with this transaction record.

SELECT ProductNumber,ModifiedDate, Name,StandardCost, Color,ListPrice,Size, Weight,
HASHBYTES('SHA1',Name+ CAST(StandardCost AS VARCHAR(20))+ ISNULL(Color,'') +CAST(ListPrice AS VARCHAR(20))+ISNULL(Size,'')+ CAST(ISNULL(Weight,0) AS VARCHAR(20))) As SRC_ChangingAttributeChecksum
FROM SalesLT.Product AS src

2. The Lookup component queries the DW DimProduct table using a SQL query. Similar to the source, a checksum column is created on the changing attributes of the Dim table that corresponds to the source columns.

SELECT ProductAlternateKey,
HASHBYTES('SHA1',EnglishProductName+ CAST(ISNULL(StandardCost,0) AS VARCHAR(20)) + ISNULL(Color,'') +CAST(ISNULL(ListPrice,0) AS VARCHAR(20))+ISNULL(Size,'')+ CAST(ISNULL(Weight,0) AS VARCHAR(20))) As LKUP_ChangingAttributeChecksum
FROM dbo.DimProduct
Screen Capture 1 - Type 1 SCD

Screen Capture 1 – Type 1 SCD

3. The source records that do not match the Dim records on the unique key are new records and inserted to the Dim table.

4. For the source records that do match the Dim records, only the source records that are modified must be identified. This is done by the Conditional Split column that checks the checksum of the source record with checksum of the lookup record. If the checksums match, then it means the source and Dim record is the same and hence the source record can be discarded.

5. If the checksums differ, then the dim record gets updated by the OLEDB SQL component with the current value from source.

Type 2 SCD

An approach similar to Type 1 SCD can be used to address Type 2 SCD using a combination of Lookup and Conditional Split components by making simple adjustments to handle historical attributes (in addition to changing attributes) and ageing. Screen Capture 2 shows a simple implementation of Type 2 SCD.

Screen Capture 2 - Type 2 SCD

Screen Capture 2 – Type 2 SCD

Here you can see

1. Both the source and lookup query contain checksums on the both the Changing AND Historical attributes.

2. The source records that do not match the Dim records on the unique key are inserted to the Dim table as before.

3. The Conditional Split component determines whether there are changes to checksum of Historical and Changing attributes of existing Dim records from the source.

Screen Capture 3 – Historical and Changing Attributes Checksum Comparison

4. The source records that differ on the Changing Attributes are updated by the OLEDB SQL Component.

5. The source records that differ on the Historical attributes are first aged and End Dated by the OLEDB SQL Component and then the current record is inserted as a new Dim record.

Of course this option is a middle ground between Options 1 or 2 and SCD Wizard Component, but is definitely more robust than the SCD Wizard Component and does the trick especially with small projects when you are dealing with few specialized packages.

Download SSIS Package – Alternative to SCD Wizard Component

About these ads

26 comments on “Alternatives to SSIS SCD Wizard Component

  1. Pingback: Using Merge T-SQL instead of the SCD (Slowly Changing Dimension) to improve performance | BI Monkey

  2. Mechelle Herr
    June 11, 2010

    [...] How to create type 1 & 2 SCD’s using standard SSIS components (other than the SCD) (at the bottom of the post) – Benny Austin [...]
    +1

    • Massimo
      July 5, 2012

      It does not work. Look at this sample:

      DECLARE @STR AS VARCHAR(1024)
      DECLARE @TEST1 AS VARCHAR(50)
      DECLARE @TEST2 AS VARCHAR(50)
      DECLARE @TEST3 AS VARCHAR(50)
      SET @TEST1 = ”
      SET @TEST2 = ‘MASSIMO’
      SET @TEST3 = ”

      SET @STR = CONVERT(VARCHAR(1024), ISNULL(@TEST1, ”)) +
      CONVERT(VARCHAR(1024), ISNULL(@TEST2, ”)) +
      CONVERT(VARCHAR(1024), ISNULL(@TEST3, ”))

      SELECT HASHBYTES(‘SHA1′, @STR)

      SET @TEST1 = ‘MASSIMO’
      SET @TEST2 = ”
      SET @TEST3 = ”

      SET @STR = CONVERT(VARCHAR(1024), ISNULL(@TEST1, ”)) +
      CONVERT(VARCHAR(1024), ISNULL(@TEST2, ”)) +
      CONVERT(VARCHAR(1024), ISNULL(@TEST3, ”))

      SELECT HASHBYTES(‘SHA1′, @STR)

      Checksum value is the same but @TEST1 and @TEST2 have different value.

      • Benny Austin
        July 5, 2012

        Massimo, it does work if the NULL values are handled appropriately. In your example, the string input to checksum function is evaluating to the same value and that’s why the checksums are same. If the NULL values are handled as shown below you will get a different checksum

        DECLARE @STR AS VARCHAR(1024)
        DECLARE @TEST1 AS VARCHAR(50)
        DECLARE @TEST2 AS VARCHAR(50)
        DECLARE @TEST3 AS VARCHAR(50)
        SET @TEST1 = NULL
        SET @TEST2 = ‘MASSIMO’
        SET @TEST3 = NULL

        SET @STR = CONVERT(VARCHAR(1024), ISNULL(@TEST1, ‘N.A’)) +
        CONVERT(VARCHAR(1024), ISNULL(@TEST2, ‘N.A’)) +
        CONVERT(VARCHAR(1024), ISNULL(@TEST3, ‘N.A’))

        SELECT HASHBYTES(‘SHA1′, @STR) –0x87244FEA54DBEFBAFA312E212D16669DD52C429B

        SET @TEST1 = ‘MASSIMO’
        SET @TEST2 = NULL
        SET @TEST3 = NULL

        SET @STR = CONVERT(VARCHAR(1024), ISNULL(@TEST1, ‘N.A’)) +
        CONVERT(VARCHAR(1024), ISNULL(@TEST2, ‘N.A’)) +
        CONVERT(VARCHAR(1024), ISNULL(@TEST3, ‘N.A’))

        SELECT HASHBYTES(‘SHA1′, @STR) –0xD9E3F10DCACAB31186FC80D56AFEAFE86F5B832F

  3. Paul
    June 12, 2010

    Great post, Benny. One question… The order of the 2 expressions in the conditional split looks like it would cause a record with different (lookup vs source) changing and historical attributes to be routed to the changing attribute path. Should this be reversed?
    Thanks,
    Paul

    • Benny Austin
      June 14, 2010

      Paul, you are absolutely right. Historical Attributes take precedance over Changing Attributes. Thanks for bringing that to my attention. The post now has the correct screen capture. Cheers

  4. Tinku
    October 7, 2010

    Benny ,

    Its a wonderful post.
    But how do i get these values
    HistoricalAttributeCheckSum, LKUP.HistoricalAttributeCheckSum, ChangingAttributeCheckSum,
    LKUP.ChangingAttributeCheckSum.

    I am new to ssis and DataWarehouse.
    Please let me know thanks.

    Tinku

  5. Anup
    November 5, 2010

    Can you provide a link to download the package ?

  6. richard_bailey
    December 29, 2010

    pardon me I am a starter, my question is why are we even creating a checksum, can we simply add the columns?

    Regards,

    richard.

    • Benny Austin
      December 30, 2010

      Richard, you could do that if there are few columns. Checksum is always of fixed length, so you wouldn’t have truncation issues when dealing with large number of columns. And if you materialize checksum as columns they could also serve as hash index (skinny index performs better)

  7. ebonato
    July 21, 2011

    Nice post!!. I’d recommend Option 1, as Kimball SCD Codeplex component is fast enough as it manages multiple threads automatically.

  8. Jack
    November 5, 2011

    Can we used this alternative without using hashbytes function?

    • Benny Austin
      November 5, 2011

      If your dim has few attributes you could compare the columns directly without using hashbytes. Since hashbytes always evaluate to a fixed length, they are extremely useful when your dimension has a number of attributes that must be checked for changes. And if you decide to materialize checksum as columns hashbytes could also serve as hash index and since they are skinny index they performs better.

  9. Tinku
    May 3, 2012

    Hi Austin, You saved my day I was using the same approach as “Use Available SSIS Components” but got struck. I was using muticast and to update the historical records and Inserting at the same time.
    Thanks a lot
    Tinku

  10. Miljan Radovic
    December 19, 2012

    Hi everyone!

    You can check this free option: http://scdmergewizard.codeplex.com/.

    It uses TSQL MERGE statement and application helps you manage that MERGE statement very easily.

    Cheers
    Miljan

  11. BorkaS
    January 10, 2013

    Hi Benny
    First, thanks for your posts and for this one, in particular!
    I have some comments regarding Massimo’s post:
    Actually, I believe his thought was somewhat correct. Replacing NULL by a value is quite dangerous in that case as our lookup opponent can have the replacing value but not NULL in the same position. As a result, we are going to get the same hash whereas different members have been provided. If the column which values we are going to compare, has no check constraints on it then it could theoretically lead to a mistake. In case it has any check constraints, we could simply use the value that is out of the domain to replace NULL?
    What do you think of it?

    Thanks,
    BorkaS

    • Benny Austin
      January 11, 2013

      BorkaS, you are right. You should choose a default value for NULL value dimension attributes that is unlikely to occur in a valid transaction. Some systems may supply blank strings, spaces or special characters instead of NULL and they should be treated as NULLS in DW. Here is a recommendation from Kimball group – http://www.kimballgroup.com/2003/02/06/design-tip-43-dealing-with-nulls-in-the-dimensional-model/

      • BorkaS
        January 11, 2013

        Thanks Benny for your valuable responses!
        I really appreciate your effort to help us!

        Good luck,
        BorkaS

  12. danieladeniji
    December 17, 2013

    Wow! Nice write-up.
    Thanks,
    Daniel

  13. Zed
    August 28, 2014

    Nice article! However, could you please refresh the download link on skydrive (the link above is not working)..

    • Benny Austin
      August 28, 2014

      Zed, there is link at the end of the post (not the one in comments section) that should work. Did you try that ? If it doesn’t work let me know and I’ll email the package

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Copyright

© Benny Austin.
Follow

Get every new post delivered to your Inbox.

Join 197 other followers

%d bloggers like this: