Using the new TSQL MERGE Statement with SQL Server 2008The MERGE statement that was added with SQL Server 2008 provides a better way to manage and update/delete/insert into our dimension tables for our data warehouses. It is very common, when loading data, to insert new records or update existing records to our Dimension tables right before adding new data to our fact tables. There are many ways and many methods to do this without the Merge script. The purpose of this post is to demonstrate how you can manage dimension tables using MERGE.Let’s say that you are updating your data warehouse tables and you have a PRODUCT dimension (DimProduct) and a Sales Fact table (factSales). If you are loading sales data every day, you will need to update the DimProduct table with any new Products , and quite possibly you will need to update your existing products with new data values. For simplicity let’s assume that we have 4 fields in our DimProduct table; ProductID, ProductName, ProductColor, ProductCategory and that the last 3 field might get updated at any time (for the purposes of this post, we will not get into slowly changing dimensions). Assume your table had 4 rows of data as shown below:
Let’s also assume that you have your data that is being loaded in a table called StagingProduct. Now let’s say your daily loading process determines that the Product Name for Shirt need to change to Polo Shirt and that there is a new product called Jeans. Your loading process also contains the other products unchanged. The screen below shows what is in your StagingProduct table.
Instead of doing an INSERT for the fields that you have in your loading process that do not exist in the DimProduct table and then doing a separate update by looking at data that is in both your loading process and in the DimProduct table (where the data is different), you can execute the following MERGE statement.
MERGE dbo.DimProduct AS TargetUSING (SELECT ProductID, ProductName, ProductColor, ProductCategory FROM dbo.StagingProduct) AS SourceON (Target.ProductID = Source.ProductID)WHEN MATCHED THEN UPDATE SET Target.ProductName = Source.ProductNameWHEN NOT MATCHED BY TARGET THEN INSERT (ProductID, ProductName, ProductColor, ProductCategory) VALUES (Source.ProductID, Source.ProductName, Source.ProductColor, Source.ProductCategory)OUTPUT $action, Inserted.*, Deleted.*;
Once you execute this the results of your query will look like this:
You may want to use the results of the MERGE statement to log the process or possibly to do some 2nd process based on the results. You can also choose to ignore or leave out the OUTPUT ;-)
Your DimProduct table will look like this once the MERGE is completed:
Note: The purpose of this post is to demonstrate (in a simple example) how to use the MERGE statement and NOT to try to present best ETL practices.
For questions please contact me at BretUpdegraff@Yahoo.com or
Follow me on Twitter www.Twitter.com/bretupdegraff