While SSAS allows you to build a cube straight of the fact and dimension tables, however there are benefits in creating the cube from database views that sits on top of the underlying fact and dimension tables. Here’s why. Continue reading
It’s quite common to analyse a measure by categorizing it into different buckets or bands. For example to list the customers who are due for payment in the next week or fortnight, to decide which income group to target for a particular product campaign, which age group of students are most likely to buy a smartphone and so on. These bands are usually not supplied by source systems but derived from the underlying measure and they exist only for analytic purposes. While this banding could be achieved in reports or dashboards, there is significant value to model them as dimensions especially to promote self-service BI.
In this example, fact_admission represents the fact data collected from hospital admissions. AGE_YRS is the client’s age at the time of admission. DIM_AGE is banding dimension for age group with the granularity being AGE in yrs.
Screen Capture 1- Age Group Bands
- DIM_AGE is type 1 SCD that has all ages say from 0 to 130 yrs.
- AGE_SK is the surrogate key. It’s not an identity rather it takes same value of AGE except for Unknown SKs, in which case AGE is NULL. This means the ETL does not have to look-up this dimension table.
- DIM_AGE has an out of bound surrogate key with value 999 to cater for age greater than 130 yrs.
The three age group bands are defined as follows
|Infant||AGE Below 1 yr|
|Toddler||AGE between 1-3 yrs|
|Pre-schooler||AGE between 3-5 yrs|
|Young Kids||AGE between 5-7 yrs|
|Big Kids||AGE between 7-12 yrs|
|Teens||AGE between 12-18 yrs|
|Young Adult||AGE between 18-39 yrs|
|Middle Age||AGE between 40-59 yrs|
|Seniors||AGE 60 and over|
|Unlicensed Drivers||AGE Below 17 yrs|
|Young Drivers||AGE between 17-25 yrs|
|Silver License||AGE between 25-30 yrs|
|Gold License||AGE between 30-75 yrs|
|Older Drivers||AGE over 75 yrs|
|Pre-schooler||AGE below 5 yrs|
|School||AGE between 5-18 yrs|
|Uni||AGE between 18-25 yrs|
|Salaried||AGE between 25-60 yrs|
|Pensioner||AGE over 60yrs|
- When building the dimension, it is recommended to build dimension using a view on DIM_AGE. This allows to include columns that could act as key columns to the age bands and more importantly to sort the different captions in the required order.
- The materialized AGE_SK column on fact table is optional. The fact table can be linked to DIM_AGE using a view on fact table because of the fact the surrogate key is directly derived from age measure.
- A new age group can be easily created just by adding a new column to the dimension table without the need to re-key fact tables.
Incorporating banding as dimension simplifies reporting, self-service BI platform and also provides a design that is highly scalable. Banding can be extended to any measure like income, duration, count , miles and percentile and so on.
In a self-service BI platform, every member of a dimension attribute is available for browsing irrespective of whether the member has an associated fact or not. Though the unused members disappear when browsed alongside facts, they are still undesirable when used as filters. This is especially true with role playing dimensions which cover a broad range of data to cover every scenario. No one wants a date filter starting at 1900-01-01 and going all the way to 9999-12-31.
There are different options to exclude the unused dimension attribute members
Option 1: Use a View on Dimension Table
It’s always a good practise to build dimension using view on dimension table because of the agility it offers. A simple view like this would filter out any member that’s never been used in any fact table.
CREATE VIEW vdimDate AS SELECT dateAttributes FROM dimDate WHERE EXISTS (SELECT 1 FROM factA WHERE dateSK=dimDate.dateSK) OR EXISTS (SELECT 1 FROM factB WHERE dateSK=dimDate.dateSK) OR EXISTS (SELECT 1 FROM factC WHERE dateSK=dimDate.dateSK) GO
Option 2: Use a flag on Dimension Table
Option 1 works like a charm on small dimension tables. With large data volumes, the SSAS processing time on these dimensions tend to increase because the query gets executed for each attribute. This can be overcome by modelling the dimension to include a materialized flag to indicate whether the dimension member is ever referenced by any fact. This flag is set to true by the ETL before the SSAS cube is processed using a simple update statement
UPDATE dimDate SET DATE_IN_USE_FLAG=1 WHERE EXISTS (SELECT 1 FROM factA WHERE factA.dateSK = dateSK) OR EXISTS (SELECT 1 FROM factB WHERE factB.dateSK = dateSK) OR EXISTS (SELECT 1 FROM factC WHERE factC.dateSK = dateSK)
The update statement would execute much faster when compared to SSAS processing time especially if a view is created on the dimension table to include only the dimension members in use.
CREATE VIEW vdimDate AS SELECT dateAttributes FROM dimDate WHERE DATE_IN_USE_FLAG=1 GO
Option 3: Customize frequently used Role Playing Dimensions
Role playing dimensions need special attention. Sometimes in addition to option 1 or 2 you will also have to convert a frequently used role playing dimension to non-role playing dimension. This will exclude any unused member from a specific fact table (instead of any fact table). The key point to remember while taking this hybrid approach is regardless of whether the dimension is role playing or not, as long as it is derived from same source it must be a Conformed Dimension. This means the name and number of the attributes, key column, name column and value column, attribute relationships, sort order and hierarchies must be the same between the role playing and non-playing dimensions derived from same data source. The only difference is the number of members in a non-role playing dimension is fewer than the role playing dimension
Option 4: Work some MDX Magic in Presentation Layer
Self-service BI platforms are inevitably augmented by few out-of-box SSRS reports and dashboards. By using MDX expressions and data sets you could customizes the filter to display only the data relevant in that context. Here are some MDX examples for SSRS and PerformancePoint
By combining the above options you could ensure that your self-service BI platform is agile, practical and user friendly.
Dimension Tables usually have a member with Surrogate Key (SK) value of -1 to handle unknown members. Unknown SK is usually set when SSIS look-up on a dimension table fails to return a match based on a business/natural key. Continue Reading
Date Dimension is the most commonly used dimension in any BI implementation. Often used as role playing dimension, there is practically no BI Implementation without date dimension. Adding current indicator attributes with Y/N values to date dimension like current year, current quarter, current month, current week and today enhances dashboarding capabilities and provide better navigation for business users. Continue Reading