Row Insert from SSIS package Vs Transact-SQL Statements

SQL Server writes transaction logs sequentially. The size of transaction logs when using sql INSERT statement depends on the Recovery Model of the database and the type Transact-SQL statements used. Database recovery model can be – Simple (min), Bulk-Logged (intermediate) or Full (max). While this is true for Transact-SQL INSERT statements, records inserted through an SSIS Data Flow Destination (like SQL Server Destination) exhibit a different logging behaviour. This is because the INSERT statements issues from Integration Services engine is always minimally logged and it bypasses the recovery model of the database engine. This results in significant performance improvement by reducing the sequential disk writes, especially when dealing with huge recordsets which is typical in a data warehouse ETL scenario. That’s another reason why SSIS packages perform better over Transact-SQL Stored Procedures for ETL purposes.

Here is Proof

I performed a simple test by inserting 19,815 records from AdventureWorks Customer table to a copy table using a Transact-SQL Stored Procedure (shown below) and an SSIS Package (Screen Capture 1). The database was set to Simple recovery mode.

CREATE PROCEDURE spMinimal
AS
BEGIN

DECLARE  curCustomer  CURSOR FOR
SELECT [TerritoryID]
       ,[CustomerType]
      ,[rowguid]
      ,[ModifiedDate]
  FROM [AdventureWorks].[Sales].[Customer]

DECLARE @intTerritoryID AS int
DECLARE @charCustomerType AS nchar(1)
DECLARE @rowguid AS sysname
DECLARE @ModifiedDate as datetime

truncate table Sales.Customer_TSQL1

OPEN curCustomer
FETCH curCustomer INTO @intTerritoryID,@charCustomerType,@rowguid,@ModifiedDate

WHILE @@FETCH_STATUS=0
	BEGIN
			INSERT INTO [AdventureWorks].[Sales].[Customer_TSQL1]
			   ([TerritoryID]
			   ,[CustomerType]
			   ,[rowguid]
			   ,[ModifiedDate])
			VALUES
				(@intTerritoryID,
				@charCustomerType,
				@rowguid,
				@ModifiedDate
				)
		FETCH curCustomer INTO @intTerritoryID,@charCustomerType,@rowguid,@ModifiedDate

	END

	CLOSE curCustomer
	DEALLOCATE curCustomer
END

Screen Capture 1 - SSIS Package
Screen Capture 1 - SSIS Package

 The log information was captured after execution of the Transact-SQL Stored Procedure and SSIS Package using fn_dblog() as shown below:

SELECT DISTINCT operation,[log record length],count(1) [log count]
FROM fn_dblog(null, null)
WHERE allocunitname like '%Sales.Customer_TSQL1%'
GROUP BY operation,[log record length]
ORDER BY count(1) DESC, [log record length] DESC,operation

The results showed the transaction log size and the number of transaction log entries were significantly less for the inserts from SSIS packages

Screen Capture 2 -  Comparison SSIS Vs Transact-SQL Logging
Screen Capture 2 - Comparison SSIS Vs Transact-SQL Logging

 Benny Austin

8 thoughts on “Row Insert from SSIS package Vs Transact-SQL Statements

  1. This provides an insight on the reason behind performance gain using SSIS, thanks. However I have two questions related to it,

    – in your comparison you have used a cursor, what if you replace it by joins? In a more practical approach, you’d be playing with at least one temp table (if not more) where you’d perform some cleanings etc.

    – If the SSIS engine bypasses those logging mechanism, it implies that records inserted from SSIS won’t be recoverable or say won’t be traceable through transaction log? Please comment.

    Thanks

    1. Kazim, thanks for your comment. The objective of this post was to demonstrate the minimal logging capability of SSIS which is not available to ETL frameworks designed using T-SQL only. In most DW implementations the recovery mode is set to Simple mode to improve the speed of ETL. This is an important step considering the volume of transactions and the need to reduce latency for the availability of latest data for business analysis and reporting. Yes the transaction log would be minimal in this way. Having said that a well designed ETL framework should be designed to have the capability to reload from scratch or from any given date from the underlying source systems. It is for this reason the Datawarehouses have a simple backup and recovery model compared to transaction systems.

    1. For minimal logging to work in SQL Server 2008, the target table must be TABLOCK’ed and recovery mode has to be simple or bulk-logged, which most DW would satisfy.

      INSERT INTO target_table WITH (TABLOCK) SELECT * FROM source_table

      This is a very good feature, considering in SQL Server 2005 only SELECT INTO and Bulk Insert statements are minimally logged. I can see one drawback though, TABLOCK in insert statements cannot use multiple threads (Bulk Inserts are an exception). This would be sufficient for simple table extractions. It can however be bottle-neck when the ETL layer is designed for parallel runs where data from different sources are inserted into a partioned table. Also the tranformation layer in most ETL solutions are not simple inserts.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s