posted 5/24/2012 by MichaelSimon - Views: [902]
Change data capture is an important consideration when you need to perform an update and an insert in SSIS. Do you need to recognize the change in your ETL or has it been handled previously? Plus doing both insert and update often requires several transforms in addition to your source and destination. Task Factory is an add-in for SQL Server Integration Services that expands the number of sources, destinations, and transforms at your disposal without requiring a script task. One such destination is the Upsert Destination.
The Upsert Destination has a Bulk Update feature. This is ideal when you have a source that provides a date or column that tells you if it has changed. This allows you to filter your source table at the source task and bring in only the rows that are new (since the last time you did ETL for instance.) When utilizing the insert and bulk update method in the Upsert Destination it acts much like this TSQL code would:
MERGE dbo.05upsert AS Target USING ( SELECT[CustomerKey]
,[FirstName]
,[LastName]
,[Phone]
FROM [TF-Demo].[dbo].[05SalesContactInfo]) AS Source
ON (Target.CustomerKey = Source.CustomerKey) WHEN MATCHED THEN UPDATE SET Target.FirstName = Source.FirstName
, Target.LastName = Source.LastName
, Target.Phone = Source.Phone
, Target.LastModifiedDate = @ad_CurrentDT WHEN NOT MATCHED BY TARGET THEN INSERT (CustomerKey, FirstName, LastName, Phone) VALUES ( Source.CustomerKey, Source.FirstName, Source.LastName, Source.Phone);
As you can tell from the code it is checking only the CustomerKey field to determine if a row already exists. If that row exists it updates that row. If that row does not exist it performs an insert.
Within the Upsert Destination it would look like this:
What if, however, as a condition for update you want to not only make sure the CustomerKey exists but you also want to see if a particular field in that row has changed, such as the Phone field. This is the case where the change has to be handled in your ETL. To do this you would use the Column Compare Update Method that acts much like this TSQL code would:
MERGE dbo.05upsert AS Target
USING ( SELECT[CustomerKey]
ON (Target.CustomerKey = Source.CustomerKey) WHEN MATCHED AND ( Target.Phone <> Source.Phone ) THEN UPDATE SET Target.FirstName = Source.FirstName
To set this up in the Upsert Destination Task you would keep the same settings on the General and Advanced tabs as previously shown but would adjust the Update Method task to Column Compare and check the Phone and CustomerKey fields as in:
When you use the Column Compare approach you add considerable overhead because you are making the determination within the component about data change.
If you can make the determination prior to bringing data into the package you can really take advantage of the speed of the bulk update. The Upsert Destination also saves you considerable development time because it handles the merge, split, and destination in one task.