Benny Austin

SSIS Pivot Transform vs T-SQL Pivot


Pivot transformation converts multiple rows of data into a single record by using an input column that has a finite set of distinct values. Pivoting is typically used in DW solutions to handle multi-valued dimensions. In SQL Server, Pivoting is done using either      

  1.  T-SQL Pivot statement or
  2.  SSIS Pivot Transformation

  
There are two significant benefits that stand-out in favour of using SSIS Pivot Transformation for ETL scenarios. The remainder of this post explains just that.       

Aggregate functions – not a requirement for SSIS Pivot Component

 
Consider this example from AdventureWorks. The [VendorContact] table lists the contact types and phone numbers of each vendor. There are multiple contacts for a vendor as shown below (Screen Capture 1):      

Screen Capture 1- Vendor Contacts (Pivot Input)

Screen Capture 1- Vendor Contacts (Pivot Input)

Say we pivot these records and build a multi-valued dimension called [DimVendorContact] as shown in Screen Capture 2.      

Screen Capture 2 - DimVendorContacts (Pivot Output)

Screen Capture 2 - DimVendorContacts (Pivot Output)

  First let’s build the dimension using T-SQL Pivot statement. The T-SQL Pivot statement will be something like this.    

SELECT VendorID,
[1] as [Accounting Manager],[2] as [Assistant Sales Agent],
[3] as [Assistant Sales Representative], [4] as [Coordinator Foreign Markets],
[5] as [Export Administrator], [6] as [International Marketing Manager],
[7]as [Marketing Assistant], [8] as [Marketing Manager],
[9] as [Marketing Representative], [10] as [Order Administrator],
[11] as [Owner], [12] as [Owner/Marketing Assistant],
[13] as [Product Manager], [14] as [Purchasing Agent],
[15] as [Purchasing Manager], [16] as [Regional Account Representative],
[17] as [Sales Agent], [18] as [Sales Associate],
[19] as [Sales Manager], [20] as [Sales Representative]
FROM
(
SELECT  VContact.VendorID,ContactType.ContactTypeID, Contact.phone
FROM Purchasing.VendorContact as VContact
inner join Person.Contact as Contact
on VContact.ContactID =Contact.ContactID
inner join Person.ContactType as ContactType
on VContact.ContactTypeID = ContactType.ContactTypeID
)
AS SourceQuery
PIVOT
(MIN(SourceQuery.phone)--AGGREGATE FUNCTION!!!
FOR SourceQuery.ContactTypeID IN
([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],
[11],[12],[13],[14],[15],[16],[17],[18],[19],[20]))
AS AliasTable
<div class="mceTemp mceIEcenter"> </div>
ORDER by VendorID

 
Notice that an aggregate function is required on the Phone Number column for this T-SQL statement to work (although it makes no sense to aggregate phone number). That’s a serious implication. T-SQL aggregate functions are the least efficient when it comes to performance especially with high data volumes and is a potential performance bottle-neck.    Compare this with a package that uses SSIS Pivot Transform which achieves the same result. Notice that in Screen Capture 3, by using appropriate Pivot Usage values for the input columns and by defining the output columns we can achieve the same result without using Aggregate functions.     

 
Screen Capture 3 - SSIS Pivot Transformation does not require Aggregation Function
Screen Capture 3 – SSIS Pivot Transformation does not require Aggregation Function
 

 

Clear distinction of Set Key and Pass through Columns in SSIS Pivot Component

 
Set Key Columns is the combination of one or input columns that determine the uniqueness of the Pivoted output. In this example VendorID column is the one and only Set Key column. This means there is one VendorContact pivot record for every Vendor. Both T-SQL and SSIS Pivot Component support Set Key Columns although handled differently. In T-SQL Pivot Statement Set Key columns are the non-pivoted columns whereas in SSIS Pivot Component it is defined by setting PivotUsage=1 on the input column. Now, let’s say we add Modified Date of the contact to the Set Key in addition to VendorID. That means we are expecting a pivoted record for every combination of Vendor and Modified Date.      

The T-SQL Pivot Statement would have to be modified as shown below to achieve this.      

SELECT VendorID,ModifiedDate, --Non-Pivoted,Set Key Columns
	[19] as [Sales Manager],
	[18] as [Sales Associate],
	[17] as [Sales Agent],
	[2] as [Assistant Sales Agent]
FROM
(
SELECT  VContact.VendorID, Vcontact.ModifiedDate,ContactType.ContactTypeID, Contact.phone
FROM Purchasing.VendorContact as VContact
inner join Person.Contact as Contact
	on VContact.ContactID =Contact.ContactID
inner join Person.ContactType as ContactType
	on VContact.ContactTypeID = ContactType.ContactTypeID
	WHERE VContact.ContactTypeID in (2,17,18,19)
)
AS SourceQuery
PIVOT
(MIN(SourceQuery.phone)
FOR SourceQuery.ContactTypeID IN
([1],[2],[3],[4],[5],[6],[7],[8],[9],[10],
[11],[12],[13],[14],[15],[16],[17],[18],[19],[20]))
AS AliasTable
ORDER by VendorID

 Similarly the package using SSIS Pivot Transformation component would be modified and now looks like the Screen Capture shown below

Screen Capture 4

The output from either would be as shown below in Screen Capture 5. Notice from the screen capture, that the Pivoted Output has a new row for VendorID=47 because of a different Modified Date in one of the Phone Number record.

Screen Capture 5 - Pivot Output with Multiple Set Keys

In essence both T-SQL Pivot Statement and SSIS Pivot Transform support multiple Set Keys. However they differ significantly on the Pass through Columns. Pass through Columns are input columns that the transform passes through from input to output untouched. Pass through columns cannot be specified in T-SQL Pivot statement. Suppose in the previous example if we want to treat Modified Date as Pass through Column instead of Set Key Column, there isn’t a way to specify on the T-SQL statement. Columns are defined as non-pivoted columns automatically become Set Key Columns.

However, this can be easily achieved in SSIS Pivot Transform by setting PivotUsage=0 on Modified Date input column. This is a very useful feature to pass through flag fields, audit columns and bits of static information without defining them as Set Key Columns. If T-SQL Pivot statement, this can be achieved only by using a separate sql statement. Notice in Screen Capture 6, VendorID=47 does not break into a new record even though it has multiple Modified Dates.

Screen Capture 6 - Pivot Output with Pass Through Columns

Be mindful of …

Few pointers to be aware of when using SSIS Pivot Transformation

  1. Interface is inexplicably non-intuitive, at least at the moment.
  2. Sort the data source on set key columns before pivoting to avoid unpredictable results. Notice that in the examples above I used a SSIS Sort transform, you could as well sort the data source query.
  3. If pass through columns have multiple values, the pivot transform component uses the first value and discards the rest. You can manipulate which value to choose by sorting the pass through columns in desired order.
  4. There is a known issue with pivot transform when using fields of data type datetime2 as Set Keys. However a simple workaround is available.

In conclusion, I believe the purpose of T-SQL and SSIS Pivot Transform is completely different although they perform the same pivoting function. T-SQL Pivot statement is great for reporting purposes that involves pivoting with aggregation. SSIS Pivot transform is the way to go for ETL purposes – it gets the job done even if the interface is not so desirable (which I hope will improve).

Benny Austin

About these ads

5 comments on “SSIS Pivot Transform vs T-SQL Pivot

  1. Pingback: SSIS Pivot Transform vs T-SQL Pivot « Benny Austin Get Pivot

  2. Pingback: Pivoting in absence of Pivot Column « Benny Austin

  3. Ådne Hovda
    December 8, 2010

    Great post. Do you happen to have an example of how to accomplish pass through columns with PIVOT using T-SQL only?

    • Benny Austin
      December 9, 2010

      Thanks for your feedback. As far as I’m aware of, pass through columns is not supported by T-SQL Pivot statement.

  4. neeth
    November 22, 2012

    Really great post…. Thanks a lot for this…. Sort key position is the thing I missed out which lead me to see unpredictable result… After using sort key everything is proper… Really thanks for those pints at bottom of post..

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

Follow

Get every new post delivered to your Inbox.

Join 172 other followers

%d bloggers like this: