Slowly Changing Dimension (SCD) is great for tracking historical changes to dimension attributes. SCDs have evolved over the years and besides the conventional type 1 (update), type 2 (add row) and type 3 (add column), now there are extensions up to type 7 including type 0. Almost every DW/BI project has at least few type 2 dimensions where a change to an attribute causes the current dimension record to be end dated and creates a new record with the new value.
While type 2 dimensions are great to track attribute history, you should also consider the following in your design decisions:
- ETL for type 2 dimensions is relatively complex. At the time of writing, out of box SSIS SCD component has performance issues and you have to rely on alternative workarounds.
- Date range SSIS lookup for type 2 SCD uses partial cache instead of full cache
- Type 2 SCD requires more disk space.
In some instances type 2 SCD may not be appropriate. In this post, I would like to mention few common anti-patterns I have encountered around type 2 SCD.
Anti-pattern 1: Modeling almost every dimension as type 2 SCD
This is a usually a consequence of inadequate business analysis and/or over engineering. Not every dimension is type 2 SCD. Do not over engineer to cater to a future requirement that may never occur. If there is no business need to capture history, neither should you. Do explain to business what it means to capture attribute history and ask the question before you design.
Anti-pattern 2: No Business Effective Date, only DW Effective Date
If tracking changes is important to business, most source systems would already have mechanism to capture effective date of change either directly or indirectly. Dimensions that do not have business effective date and have only DW effective date instead is often an indication that the dimension is not a type 2 candidate. It is important to know when the business change happened is as opposed to when the DW knew about the change. The DW effective date is an artificially manufactured date that will be lost if the dimension is re-initialized.
Anti-pattern 3: Tracking every attribute’s history
Even if your dimension is a legit type 2 SCD, not every attribute needs to be tracked for historical changes.
- Data changes to some attributes do not represent a change in state. For e.g. an update to date of birth is mostly likely in response to incorrect data entry and hence an in-place update is most appropriate rather than end date the old record and create a new record with correct DOB.
- Some of the attributes are there for information purposes only. If an attribute is used as drill-through field rather than as slicer, only current value is relevant. For e.g. phone number or email id does not need history tracking.
- Some attributes exist in source for system use only. These attributes need not be promoted into the dimension in first place.
Anti-pattern 4: Track history, but only use current dimension record
After all the hard work, some implementations only ever use the current dimension record to link up to fact tables which contradicts the very reason to track attribute history as type 2 SCD.
Solid advise on Type 2 SCD Anti-Patterns twitter.com/bennyaustin/st…—
Jens Vestergaard (@vestergaardj) October 11, 2015