posted 11/21/2009 by MikeMollenhour - Views: [3457]
The Slowly Changing dimension task works great for small loads on update/inserts but if you have a large dataset that you want to use what can you do? One new answer ( sql 2008) to this question is to use the merge statement. In exploring this option I found the merge statement to be very powerful yet very verbose requiring a large amount of development time. To get around this I wrote a stored procedure to generate my stored procedure automatically. First I created a database called Stage and added two Schemas named source and target. After creating the source table definition, I scripted the table and changed it to create a target schema table with the same columns while adding a Change_Flag column to hold my change type (Insert or update). With the two tables created I pass the table name (without schema) to this proc.
CREATE function [merge].[iStringtoTable] (@p_str varchar(1000)) returns @tbl table (VAL varchar(500)) as begin set @p_str = @p_str + ',' declare @p_counter int declare @p_len int declare @p_curr_char varchar(1) declare @p_char varchar(500) declare @p_num int set @p_len = len(@p_str) set @p_counter = 1 set @p_curr_char = '' set @p_char = ''
-- loop while @p_counter <= @p_len begin set @p_curr_char = substring(@p_str, @p_counter, 1) if (@p_curr_char <> ',') begin set @p_char = @p_char + @p_curr_char end else begin --set @p_num = @p_char insert into @tbl values(@p_char) set @p_char = '' end set @p_counter = @p_counter + 1 end return end GO
Create Proc [dbo].[usp_CreateMergeStatement] (@Table varchar(1000) ,@BusinessKeyList varchar(5000) =-1) AS
Declare @curColumn varchar(500) IF @BusinessKeyList ='-1' BEGIN select @curColumn ='',@BusinessKeyList='' Select @curColumn= [COLUMN_NAME] ,@BusinessKeyList = @BusinessKeyList + ',' +@curColumn FROM [INFORMATION_SCHEMA].[COLUMNS] where TABLE_NAME=@Table and Table_Schema='source' Print right(@BusinessKeyList,len(@BusinessKeyList)-1) END ELSE BEGIN
Declare @BusKeyColumnList varchar(8000) ,@BusKeyColumnList2 varchar(8000) ,@UpdateColumnList varchar(8000) ,@INSColumnList varchar(8000) ,@INSColumnList2 varchar(8000) ,@MergeStatement varchar(300) ,@MergeColumn varchar(200) ,@UpdateColumnList2 varchar(1000) ,@UpdateColumnList3 varchar(1000) --select * from [merge].iStringtoTable (@BusinessKeyList)
set @MergeStatement='' select top 1 @MergeStatement= 'CREATE Proc [merge].[usp_' + [TABLE_NAME]+ '] as Begin
MERGE TARGET.'+ [TABLE_NAME]+ ' AS TARGET USING SOURCE.'+ [TABLE_NAME]+ ' AS SOURCE ON' --+ [COLUMN_NAME] FROM [INFORMATION_SCHEMA].[COLUMNS] where TABLE_NAME=@Table and Table_Schema='source' set @BusKeyColumnList2='' set @MergeColumn='' set @BusKeyColumnList ='' set @UpdateColumnList ='' set @UpdateColumnList2 ='' set @UpdateColumnList3 ='' select @INSColumnList='' ,@INSColumnList2 =''
Select @BusKeyColumnList = 'isnull(TARGET.' + [COLUMN_NAME] + ',-1) = isnull(SOURCE.' +[COLUMN_NAME] + ',-1)' ,@BusKeyColumnList2 =@BusKeyColumnList2 + ' AND ' + @BusKeyColumnList ,@INSColumnList =[COLUMN_NAME] ,@INSColumnList2 =@INSColumnList2 + ' ,' + @INSColumnList FROM [INFORMATION_SCHEMA].[COLUMNS] col join [merge].iStringtoTable (@BusinessKeyList) f on f.val=col.COLUMN_NAME where TABLE_NAME=@Table and Table_Schema='source' and column_Name<> 'Change_Flag' Select @UpdateColumnList = 'TARGET.' + [COLUMN_NAME] + ' <> SOURCE.' +[COLUMN_NAME] ,@UpdateColumnList2 =@UpdateColumnList2 + ' AND ' + @UpdateColumnList ,@INSColumnList =[COLUMN_NAME] ,@INSColumnList2 =@INSColumnList2 + ' ,' + @INSColumnList FROM [INFORMATION_SCHEMA].[COLUMNS] col Left join [merge].iStringtoTable (@BusinessKeyList) f on f.val=col.COLUMN_NAME where TABLE_NAME=@Table and Table_Schema='source' and column_Name<> 'Change_Flag' and f.val is null -- print @BusKeyColumnList print @MergeStatement + '(' + Right(@BusKeyColumnList2, len(@BusKeyColumnList2)-6) +' ) WHEN MATCHED AND ' + replace(replace(@UpdateColumnList,'=','<>'),' AND ',' OR ') + ' THEN UPDATE SET ' + replace (@UpdateColumnList, ' <> ', ' = ')+ ', TARGET.Change_Flag=''U''' +' WHEN NOT MATCHED THEN
INSERT (' + Right(@INSColumnList2,len(@INSColumnList2)-15) + ' ,Change_Flag ) VALUES ('+ replace (Right(@INSColumnList2,len(@INSColumnList2)-15),',',',source.') + ' ,''I'' ) ; END ' END
For the purpose of this demo I will use the table named DimCurrency. Here is the execution syntax:
USE [Stage] GO
DECLARE @return_value int
EXEC @return_value = [dbo].[usp_CreateMergeStatement] @Table = N'DimCurrency' SELECT 'Return Value' = @return_value
GO
This will return me a comma delimited list of the fields for this table. In this case my return data is
CurrencyKey,CurrencyAlternateKey,CurrencyName
I now decide which fields I would like to use as a business key from this list and remove the rest. In this case I will use “CurrencyKey,CurrencyAlternateKey” as the business key so know I just add this parameter and re-execute the proc
EXEC @return_value = [dbo].[usp_CreateMergeStatement] @Table = N'DimCurrency' ,@BusinessKeyList='CurrencyKey,CurrencyAlternateKey'
SELECT 'Return Value' = @return_value
As seen below I now receive the syntax to run for creating my proc:
CREATE Proc [merge].[usp_DimCurrency] as Begin
MERGE TARGET.DimCurrency AS TARGET USING SOURCE.DimCurrency AS SOURCE ON( isnull(TARGET.CurrencyKey,-1) = isnull(SOURCE.CurrencyKey,-1) AND isnull(TARGET.CurrencyAlternateKey,-1) = isnull(SOURCE.CurrencyAlternateKey,-1) ) WHEN MATCHED AND TARGET.CurrencyName <> SOURCE.CurrencyName THEN UPDATE SET TARGET.CurrencyName = SOURCE.CurrencyName, TARGET.Change_Flag='U' WHEN NOT MATCHED THEN
INSERT (CurrencyKey ,CurrencyAlternateKey ,CurrencyName ,Change_Flag ) VALUES (CurrencyKey ,source.CurrencyAlternateKey ,source.CurrencyName ,'I' ) ; END
As you may notice the proc has code to set the change_flag for each different type of action.
To stitch this into SSIS now I just have my source data in an ssis datasource and insert it into the destination of my source.dimcurrency table. After this is done I run the proc above and the target.dimcurrency will now have my data that I need to insert or update into production. Note if you like you could bypass this step and have your target table as your final table.