Pivoting is typically used in DW solutions to load multi-valued dimensions. In my earlier post, I demonstrated how SSIS Pivot component works and the advantages of using SSIS Pivot component over its equivalent T-SQL Pivot statement. In this post, I explain a technique to Pivot data when Pivot Column is not explicitly available.
Consider this scenario. Let’s say, your credit card issuer faces a serious threat of customers churning over to their competitors. So you are requested by your credit card issuer to participate in a survey where you are presented with a list of preferences to choose from, that if provided will prevent you from churning over to their competitors. Let’s say you are given the option to choose a max of 10 most important preferences to you from the list shown in Screen Capture 1
Let’s say you can also rate each of your preference and give a weightage with the most favourable being 100 and least favourable being 0. The credit card issuer will end up collecting data from the survey as shown below.
Now let’s say the credit card issuer wants to build a [Customer Preference Dim] to analyse the survey data. The requirement is to build the [Customer Preference Dim] in such a way that it captures the 3 most important preference of each customer based on the weightage as shown in Screen Capture 3.“]
Screen Capture 3 implies we are dealing with multi-valued dimension that requires pivoting the source data. However the Pivot Column in this scenario is not well defined. [PrefCode] cannot be used as Pivot Column because it can move between First, Second and Third Preference columns depending on the Weightage. The solution to pivot data when an explicit Pivot Column is not available lies in preparing the source data to “fabricate” a Pivot Column. In this instance I’am using ROW_NUMBER () OVER ( [ ] ) statement. The following sql statement first partitions/groups data by [CustomerID] and then sorts the grouped data by higher Weightage. It returns the data in correct sequential row number always starting from 1 (Screen Capture 4). This sequence number can be used as Pivot Column.
SELECT CustPref.CustomerID, Pref.Preference, CustPref.Weightage, ROW_NUMBER() OVER(PARTITION BY CustPref.CustomerID ORDER BY CustPref.Weightage DESC, CustPref.PrefCode ASC) AS PivotKey FROM dbo.[Customer Credit Card Preference] AS CustPref INNER JOIN dbo.[Credit Card Preferences] AS Pref ON CustPref.PrefCode = Pref.PrefCode
With the source data now ready with Pivot Column, it is now a matter of passing it through the Pivot Transform in the SSIS Package as shown below.
Since the Pivoted columns are fewer than the possible Pivot Column values, set the ErrorRowDisposition property on the Pivot Column to ignore failure (RD_IgnoreFailure). The output of from the package will be as shown below