SSIS Lookup transformation is used to find/validate existing reference record. This can be implemented in 3 modes.
- Full Cache – Referring dataset will be placed in memory before validation
- Partial Cache – Referring dataset will be accessed on run time and accessed records will be cached
- No Cache - Referring dataset will be accessed on run time for each record
Let’s check out what difference in performance Full Cache and Partial Cache make.
I have a case study where I am transferring some user responses from one DB source to other DB destination. If the source record does not exist in the destination it should copy it otherwise it should update it.
Partial Cache :
Below screen depicts above case. Data is taken from OLE DB Source and lookup transformation is used before inserting or updating the record in destination. Look up cache mode is set to Partial.
There are 192500 records in Source DB. I have stopped the transfer after about 4 minutes. You can see it has processed less than half of data.
Below is the screenshot of Sql profiler while above data flow was running. Nearly For every record lookup is hitting sql server to find reference record. You can see high no. of rows in the status bar of window.
Below screenshot is for performance counters recorded by Perfmon for Partial lookup process. If you notice Batch requests / sec is quite high @ 135 while Disk read and write are as low as 3.4 and 12.6 per second respectively. Processor queue time length is also non zero which is not good.
Full Cache :
Let’s change the mode of above lookup component to Full Cache and run it. Below screenshot is taken after setting the lookup to full cache mode. It completed the process in just 15 seconds while partial cache did not complete half of it in 4 minutes.
As full cache fetch entire reference data before starting cross validation, there are no high no. of hits to sql server. It brings all data in on go.
Let’s take a look of performance counters recorded by Perfmon for Full cache lookup process. Batch requests / sec is low @ 7 vs 135 in partial cache mode. Disk read and write are high @ 65.6 and 96 per second respectively which is far better than earlier. Processor queue time length is perfect zero.
Available Mbytes is 4211 in full cache vs 4215 in partial mode. It is because we have kept all reference data in memory.
Conclusion: Full cache mode is better in performance in comparison to partial cache mode. Full cache can be considered when
- The size of reference dataset is small which can be easily adjusted in server memory
- The source data does not have repetitive data for reference check
- The reference dataset is not dynamically getting updated.