An SSIS data flow begins and ends with a source and destination. Of course, there are a multitude of transforms available in between, but ultimately it comes down to the source and destination. Data flow destinations are designed exclusively for “new” data. Updating existing data is more difficult to achieve.
Updating data has commonly been achieved by one of two methods. The first is to use an OLE DB Command transform. This technique is the easiest to develop, but it has MAJOR performance issues. The reason is simple. The OLE DB Command transforms processes the inbound data one row at a time. This is OK for handling a few rows, but more than a handful of rows will make this method impractical.
The second method involves using a staging table and some T-SQL scripts or procedures. Using this technique requires that a staging table be created which contains the columns needed for an update statement. This solution would begin with an Execute SQL Task which truncates the staging table. The next step would be a data flow which populates the staging table. The final step would be another Execute SQL Task which runs an update statement or stored procedure.
I recently tried another method which is similar to the second method, but requires fewer SSIS objects. The technique still involves a staging table, but with a new twist. The twist is that the staging table contains an Instead Of Insert trigger. This turned to work well, and has some surprising aspects.
First, we need to prep the AdventureWorks database for a demonstration. This begins by creating a “source” table that will be used for updating the Person.Address table. This will serve as a surrogate for a flat file or other such source. Next, we create the staging table, Person._InsteadOfAddress_Trigger. This table has all the columns used for updating the Person.Address table. Following this table is the creation of the InsteadOf Insert trigger. Finally, a statement for populating the “source” table with values used for updating is created.
-- ******** Create temporary "Source" table *********************************** IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[Person].[_InsteadOfAddress_Source]') AND type in (N'U')) DROP TABLE [Person]._InsteadOfAddress_Source GO CREATE TABLE Person._InsteadOfAddress_Source (AddressID INT, AddressLine2 NVARCHAR(60)) GO -- ******** Create temporary "Source" table *********************************** -- ******** Create staging InsteadOf table ************************************ IF EXISTS (SELECT * FROM sys.objects WHERE object_id = OBJECT_ID(N'[Person].[_InsteadOfAddress_Trigger]') AND type in (N'U')) DROP TABLE [Person]._InsteadOfAddress_Trigger GO CREATE TABLE Person._InsteadOfAddress_Trigger (AddressID INT, AddressLine2 NVARCHAR(60)) GO -- ******** Create staging InsteadOf table ************************************ -- ******** Create InsteadOf trigger on staging table ************************* CREATE TRIGGER Person.[ti_UpdateInsteadOfInsert] ON Person._InsteadOfAddress_Trigger INSTEAD OF INSERT AS BEGIN SET NOCOUNT ON UPDATE t SET AddressLine2 = i.AddressLine2 FROM inserted i INNER JOIN Person.Address t ON i.AddressID = t.AddressID END GO -- ******** Create InsteadOf trigger on staging table ************************ -- ******** Populate temporary "Source" table ******************************** TRUNCATE TABLE Person._InsteadOfAddress_Source INSERT INTO Person._InsteadOfAddress_Source (AddressID, AddressLine2) SELECT [AddressID], CAST('Test Update Via InsteadOf Trigger' AS NVARCHAR(60)) AS AddressLine2 FROM [AdventureWorks].[Person].[Address] WHERE AddressLine2 IS NULL -- ******** Populate temporary "Source" table ******************************** -- ******** Reset "Target" table ********************************************* -- Run this anytime to reset the Person.Address table to its original state. /* UPDATE a SET AddressLine2 = NULL FROM [AdventureWorks].[Person].[Address] a WHERE a.AddressLine2 = 'Test Update Via InsteadOf Trigger' */ -- ******** Reset "Target" table ********************************************* -- Show the Results SELECT * FROM Person.Address WHERE AddressLine2 = 'Test Update Via InsteadOf Trigger'
The SSIS package consists of a data flow with an OLE DB Source and OLE DB Destination. The source component contains the query:
select AddressID, AddressLine2 FROM Person._InsteadOfAddress_Source
Data Flow contents
OLE DB Source setting
OLE DB Destination
OLE DB Destination Advanced Editor
NOTE: You MUST use the advanced editor on the OLE DB Destination AND type in “FIRE_TRIGGERS” as an option for the FastLoadOptions property. If you don’t do this the trigger WILL NOT fire.
Looking at the execution results of this package the elapsed time was: 00:00:02.340
Just for comparison I created another package which used an OLE DB Command transform. The elapsed time for that package was: 00:00:19.750
To me this was a little easier package to construct than one which would use SQL Tasks with a Data Flow. However, it’s a little harder to maintain and troubleshoot. One handy result of this is that because I’m using an InsteadOf Insert trigger, the staging table never actually contains any data. Another suprising aspect turns out to be very useful for me. Since I’m using the OLE DB Destination I can use the Commit Size and Batch Size to further refine my insert. In my current job I often have to write updates to run in bite-size batches in order to avoid blocking my 500+ users. Using this technique I can easily control that. Contrast that with using SQL Tasks, which would require me to build looping structures inside my T-SQL procedures.
Another useful application of this technique might be in the area of dimension maintenance. Imagine using a MERGE statement inside the InsteadOf trigger, which would both insert and /or update your dimension table(s). Hmmm....