Pivot transformation converts multiple rows of data into a single record by using an input column that has a finite set of distinct values. Pivoting is typically used in DW solutions to handle multi-valued dimensions. In SQL Server, Pivoting is done using either
- T-SQL Pivot statement or
- SSIS Pivot Transformation
There are two significant benefits that stand-out in favour of using SSIS Pivot Transformation for ETL scenarios. The remainder of this post explains just that.
Aggregate functions – not a requirement for SSIS Pivot Component
Consider this example from AdventureWorks. The [VendorContact] table lists the contact types and phone numbers of each vendor. There are multiple contacts for a vendor as shown below (Screen Capture 1):
Say we pivot these records and build a multi-valued dimension called [DimVendorContact] as shown in Screen Capture 2.
First let’s build the dimension using T-SQL Pivot statement. The T-SQL Pivot statement will be something like this.
SELECT VendorID,  as [Accounting Manager], as [Assistant Sales Agent],  as [Assistant Sales Representative],  as [Coordinator Foreign Markets],  as [Export Administrator],  as [International Marketing Manager], as [Marketing Assistant],  as [Marketing Manager],  as [Marketing Representative],  as [Order Administrator],  as [Owner],  as [Owner/Marketing Assistant],  as [Product Manager],  as [Purchasing Agent],  as [Purchasing Manager],  as [Regional Account Representative],  as [Sales Agent],  as [Sales Associate],  as [Sales Manager],  as [Sales Representative] FROM ( SELECT VContact.VendorID,ContactType.ContactTypeID, Contact.phone FROM Purchasing.VendorContact as VContact inner join Person.Contact as Contact on VContact.ContactID =Contact.ContactID inner join Person.ContactType as ContactType on VContact.ContactTypeID = ContactType.ContactTypeID ) AS SourceQuery PIVOT (MIN(SourceQuery.phone)--AGGREGATE FUNCTION!!! FOR SourceQuery.ContactTypeID IN (,,,,,,,,,, ,,,,,,,,,)) AS AliasTable ORDER by VendorID
Notice that an aggregate function is required on the Phone Number column for this T-SQL statement to work (although it makes no sense to aggregate phone number). That’s a serious implication. T-SQL aggregate functions are the least efficient when it comes to performance especially with high data volumes and is a potential performance bottle-neck. Compare this with a package that uses SSIS Pivot Transform which achieves the same result. Notice that in Screen Capture 3, by using appropriate Pivot Usage values for the input columns and by defining the output columns we can achieve the same result without using Aggregate functions.
Clear distinction of Set Key and Pass through Columns in SSIS Pivot Component
Set Key Columns is the combination of one or input columns that determine the uniqueness of the Pivoted output. In this example VendorID column is the one and only Set Key column. This means there is one VendorContact pivot record for every Vendor. Both T-SQL and SSIS Pivot Component support Set Key Columns although handled differently. In T-SQL Pivot Statement Set Key columns are the non-pivoted columns whereas in SSIS Pivot Component it is defined by setting PivotUsage=1 on the input column. Now, let’s say we add Modified Date of the contact to the Set Key in addition to VendorID. That means we are expecting a pivoted record for every combination of Vendor and Modified Date.
The T-SQL Pivot Statement would have to be modified as shown below to achieve this.
SELECT VendorID,ModifiedDate, --Non-Pivoted,Set Key Columns  as [Sales Manager],  as [Sales Associate],  as [Sales Agent],  as [Assistant Sales Agent] FROM ( SELECT VContact.VendorID, Vcontact.ModifiedDate,ContactType.ContactTypeID, Contact.phone FROM Purchasing.VendorContact as VContact inner join Person.Contact as Contact on VContact.ContactID =Contact.ContactID inner join Person.ContactType as ContactType on VContact.ContactTypeID = ContactType.ContactTypeID WHERE VContact.ContactTypeID in (2,17,18,19) ) AS SourceQuery PIVOT (MIN(SourceQuery.phone) FOR SourceQuery.ContactTypeID IN (,,,,,,,,,, ,,,,,,,,,)) AS AliasTable ORDER by VendorID
Similarly the package using SSIS Pivot Transformation component would be modified and now looks like the Screen Capture shown below
The output from either would be as shown below in Screen Capture 5. Notice from the screen capture, that the Pivoted Output has a new row for VendorID=47 because of a different Modified Date in one of the Phone Number record.
In essence both T-SQL Pivot Statement and SSIS Pivot Transform support multiple Set Keys. However they differ significantly on the Pass through Columns. Pass through Columns are input columns that the transform passes through from input to output untouched. Pass through columns cannot be specified in T-SQL Pivot statement. Suppose in the previous example if we want to treat Modified Date as Pass through Column instead of Set Key Column, there isn’t a way to specify on the T-SQL statement. Columns are defined as non-pivoted columns automatically become Set Key Columns.
However, this can be easily achieved in SSIS Pivot Transform by setting PivotUsage=0 on Modified Date input column. This is a very useful feature to pass through flag fields, audit columns and bits of static information without defining them as Set Key Columns. If T-SQL Pivot statement, this can be achieved only by using a separate sql statement. Notice in Screen Capture 6, VendorID=47 does not break into a new record even though it has multiple Modified Dates.
Be mindful of …
Few pointers to be aware of when using SSIS Pivot Transformation
- Interface is inexplicably non-intuitive, at least at the moment.
- Sort the data source on set key columns before pivoting to avoid unpredictable results. Notice that in the examples above I used a SSIS Sort transform, you could as well sort the data source query.
- If pass through columns have multiple values, the pivot transform component uses the first value and discards the rest. You can manipulate which value to choose by sorting the pass through columns in desired order.
- There is a known issue with pivot transform when using fields of data type datetime2 as Set Keys. However a simple workaround is available.
In conclusion, I believe the purpose of T-SQL and SSIS Pivot Transform is completely different although they perform the same pivoting function. T-SQL Pivot statement is great for reporting purposes that involves pivoting with aggregation. SSIS Pivot transform is the way to go for ETL purposes – it gets the job done even if the interface is not so desirable (which I hope will improve).