SSIS is a fantastic component of SQL Server used for ETL. There are often performance challenges with SSIS packages, and many of the challenges can be solved by moving logic to the database system if possible and avoiding row by row processing.
Inside the data flow, you can easily and efficiently insert data into database tables. What is more challenging, however, is doing updates or deletions. The native transform to perform these operations is called the OLE DB Command. This transform performs a command for each row that makes it to the component. The OLE DB Command is fine for updating a handful of rows, but, since it works on just one row at a time, it scales miserably.
To demonstrate how slow the OLE DB Command is, I have created a test database and copied in the Sales.SalesOrderDetail table from AdventureWorks, and created a package to update one of the columns. The data flow in the SSIS package has an OLEDB Source pointing to the table and a derived column transform which creates a new column called NewSpecialID. This column is the OrderQty times 10. Finally, there is an OLE DB Command that updates the table, changing the SpecialOfferID to the value of NewSpecialID.
To configure the OLE DB Command, you must provide a connection manager, a command and then map parameters to the command. The command looks like this:
And the parameter mapping looks like this:
The entire data flow looks like this:
When I run the data flow, unfortunately, it takes so long to run, I finally just give up and stop the package after a few minutes.
Of course, this task could be accomplished by just running a T-SQL UPDATE statement. There are, however, cases where it makes sense to do the update in SSIS. Maybe the data is coming from multiple systems. How can you make this better? One way is by using a transform outside the native transforms that come with SSIS such as Pragmatic Works’ Task Factory Update Batch Transform. This transform is even easier to use than the OLE DB Command and works much better.
I created another data flow and this time replaced the OLE DB Command with the Update Batch Transform.
To configure this transform, I must specify the table that needs updating and connect the keys to the source of data flowing in. Then I just map the columns that need to be updated.
In this example, I am matching on SalesOrderID and SalesOrderDetailID. I am updating the SpecialOfferID with the NewSpecialID. This time, the update is complete in under 2 seconds!
Task Factory also has a Delete Batch Transform to make that delete operation more efficient as well.
To learn more about the cool tasks and components found in Task Factory and about our other SSIS products that will make your development easier and your packages run faster check out http://www.pragmaticworks.com/Products.aspx