SSIS 2008 Using Excel in a Lookup Tansform

Who is online?  0 guests and 0 members
Home  »  Articles  »  SSIS 2008 Using Excel in a Lookup Tansform

SSIS 2008 Using Excel in a Lookup Tansform

change text size: A A A
Published: 12/6/2010 by  MikeDavis  - Views:  [2425]  

The lookup transform in SSIS 2008 gives you the ability to join data and eliminate non matching rows. One of the limitations of the lookup is the fact it requires an OLEDB connection. However, with the cache connection manager you can use just about any other data source as your lookup table. If you have an excel file that contains your lookup information, it would be nice to be able to lookup the information in the excel file without having to load the excel file into a table.


To accomplish this we will load the excel file into a cache connection manager and use this in the data flow. Below is my SQL Table with first names and my excel file with last names. They both have matching ID columns. I will match up the id in the lookup transform to combine the names.


Excel File with First Names

SSIS 2008 Using Excel in a Lookup Tansform

 

SQL Table with last names

SSIS 2008 Using Excel in a Lookup Tansform

The SSIS package will require two data flows. The first will load the excel file into cache. The second will have the table source and the lookup transform. Here is the Control Flow of the package.

SSIS 2008 Using Excel in a Lookup Tansform

 

In the first data flow we will have the excel source and a cache transform.

SSIS 2008 Using Excel in a Lookup Tansform

The Excel source is just a simple source with the ID column and the First Name Column. The only change you may have to make is to the data types in the excel source. I had to change the data type of the ID field to integer. That is because my table has an ID column with a data type of integer. To change the data type, right click on the excel source and select Advanced Editor. Click on the Input and Output Properties Tab, expand the output columns folder, select the ID column, and change the data type property as needed.

SSIS 2008 Using Excel in a Lookup Tansform

 

After you have the source ready, drag the green data flow line to the cache transform. Click the New button and give the cache connection manager a name. Then click on the columns tab and set the ID column’s index property to 1. You must have at least columns with an index property value greater than zero.

SSIS 2008 Using Excel in a Lookup Tansform

 

Click on the mappings node in the cache transform and map the columns from excel to the cache connection. Notice the magnifying glass next to the ID, this indicates the index column.

SSIS 2008 Using Excel in a Lookup Tansform

 

In the next data flow we will have and OLEDB source from the SQL table and a lookup transform using the cache connection. I also have a terminator destination from task factory, just so we have a place to put a data viewer to see if the first and last names are joined.

SSIS 2008 Using Excel in a Lookup Tansform

 

The OLEDB Source is just a select from a table pulling the ID and the Last Names. In the lookup transform, set the connection type to the cache connection manager.

SSIS 2008 Using Excel in a Lookup Tansform

 

Click on the connection node and set the connection to the cache connection manager.

SSIS 2008 Using Excel in a Lookup Tansform

 

Click on the columns node and map the ID column to the ID column in the cache. Place a check next to the first name field.

SSIS 2008 Using Excel in a Lookup Tansform

 

Here are the results from the data viewer after the lookup.

SSIS 2008 Using Excel in a Lookup Tansform

 

You can use this same method to load just about anything into cache and use it in the lookup transform.

The cache connection manager has some major Pros and Cons. The Pro is the speed, once the data is loaded into cache the lookup performs by reading the rows saved in memory instead of having to perform table reads. The Con would be the limited amount of memory. If you have a large excel file it could cause buffer overruns and start spooling to disk. You would have to watch your PerfMon to see if this happens. A general rule I follow is about 50,000 rows. This varies with the number of columns and the size of the data. If you have a few columns with just integers then you can fit a lot more rows in cache. If you have a bunch of columns that are varchar(5000) then you will not be able to fit as many rows in cache. Some testing when setting this up.

 
0
/5
Avg: 0/5: (0 votes)

Comments (2)

Bebel
Bebel said:

Great article Mike! 

What are the pro and cons with using a Cache Connection Manager to performs Lookups? Would you recommend it for huge table lookups? For example, when loading a Fact and looking up the surrogate key from multiple huge dimension tables, would pre-loading the data in Cache connection managers help?

Thanks!

12/8/2010
 · 
 
by
anibaldelucia
anibaldelucia said:

Bebel,

Using a cache connection manager is recommended when your look up data is static, it is good for performance but if your reference data changes rapidely you have to use the OLEDB connection.

 

3/1/2011
 · 
 
by
  • Name:*
  • Email:*
  • Website:
Type the characters you see in the image: *

Most Recent Articles