SSIS - Using the Merge Join Transform

Who is online?  0 guests and 1 members
Home  »  Articles  »  SSIS - Using the Merge Join Transform

SSIS - Using the Merge Join Transform

change text size: A A A
Published: 8/30/2010 by  MikeDavis  - Views:  [2909]  

The Merge Join Transform in SSIS is a great way to load a Data warehouse quickly and an easy way to join two data sources together. There are a few requirements to join these two data sources. The data sources must be sorted and there must be a key that you can join them with. This can all be done in the data flow of the SSIS package.

Here are some examples of a Merge Join with some different situations so you can see the outputs. The two images below show two tables. The first table contains the first names of people and the second table contains there last names. Each table contains an ID column that will be used in the join.


Table one:

SSIS - Using the Merge Join Transform
 
Table two:

SSIS - Using the Merge Join Transform
 
These are OLEDB sources. The queries in the sources are calling the data and sorting it with an order by clause. The source does not know the data is sorted so you have to manually tell the source it is sorted. You do this by right clicking on the source and selecting Show Advanced Editor.

SSIS - Using the Merge Join Transform
 
Then click on the Input Output Properties Tab. Click on the OLE DB Source Output and Set the IsSorted Property to true.

SSIS - Using the Merge Join Transform
 
Then click on the column that is the used to sort the data, in this case the ID column, then set the SortKeyPosition to 1. If you are sorting by more than one column, place a 2 on the next and so on.

SSIS - Using the Merge Join Transform

Then you connect both sources to the Merge Join. When it asks which input you set the First name table as the left input. You can always swap the inputs later by clicking the swap input button in the Merge Join Transform.

SSIS - Using the Merge Join Transform

In the Merge Join, Map the Id together and select the columns you want to pass through to the output. In this example the Join type is set to Left Outer Join. The Results will look like the below image.

SSIS - Using the Merge Join Transform
 
Notice the Null in ID 5 under last name. There was no match for ID 5, but since it was set to Left Outer Join we still get to keep the ID 5. If you set the Merge Join to Inner Join the ID 5 row would be dropped.


One of the other scenarios you will encounter is when the right side of the join has more than one match for some of the ID columns. For example, let’s add another row to the Last name column.

 SSIS - Using the Merge Join Transform

This new table has an extra row with the ID 1 and the last name Davis 2. If you run this through the Merge Join with the Join set to Left Outer Join the results will match the image below.
 SSIS - Using the Merge Join Transform

If you change the Join to Inner Join the Results will be in the below image. Notice the ID 1 is still on the output twice and ID 5 is still in the output.
 
SSIS - Using the Merge Join Transform
Now let’s add another row to the Last name table.
 SSIS - Using the Merge Join Transform

When you run this table through the Merge Join with the Join set to Full Outer Join the results will match the figure below.


SSIS - Using the Merge Join Transform 
Here you can see the Null Id. That is because we passed the ID through from the First name table. The Last name table ID was not passed through. If you want to ensure you get an Id then pass through each Id and use a derived column to check for nulls.
I hope this clarifies how to use the Merge Join Transform. This is a very powerful transform and can make joining data very fast.

 
0
/5
Avg: 0/5: (0 votes)

Comments (5)

Daniel
Daniel said:

I am glad you show sorting at the source, and how to indicate that the source is sorted.  It is vastly superior and vastly faster than the SSIS sort transform.  Nice article.

9/1/2010
 · 
 
by
Bhavesh
Bhavesh said:

Hi Mike 

 

I have tried using this transform, but still find using a lookup much more faster when doing a dataflow. What, in your experience can make Merge Join faster than a lookup? Also, if one has two tables in a database, is it not easier to join in the DB and move using SSIS.

9/8/2010
 · 
 
by
MikeDavis
MikeDavis said:

The lookup is caching data. If you have a large data set then the caching can cause performance issues. But smaller data set work great with a lookup.

9/8/2010
 · 
 
by
jskwarek
jskwarek said:

Hey Mike, new to the site but I have the same question as Bhavesh had, why not just do this join in T-SQL instead and bypass the Merge Join all together.

 

Jeff

9/30/2010
 · 
 
by
MikeDavis
MikeDavis said:

I would do this join in SQL if the data was in a position to do that. But if one of the sources was in Oracle or DB2 then SSIS would be your best choice. The best practice is to use SQL if the it is possible. If not possibel then SSIS will bethe fastest way to transfrom the data.

10/1/2010
 · 
 
by
  • Name:*
  • Email:*
  • Website:
Type the characters you see in the image: *

Most Recent Articles