When SSIS caches data in memory as it’s going through the ETL process, it uses something called buffers to control how much data in can pick up and transform/modify at once. Buffers are what SSIS uses to cache each set of data as it transforms it through the pipeline. For example, let’s take a look at the below scenario:
What’s going on right now in this package? This package is set to the defaults right now, which is 10,000 rows or 10 MB, whichever comes first. We can see that SSIS has picked up 5 buffers of ~10K rows each, and the first buffer of 9,947 rows waiting for me. In this case, it’s because I just have a data viewer on the data flow so I can see what is passing through. As this is a 20 million row fact table, my issue here is that 20,000,000 divided by 10,000 rows per buffer is, roughly, 2,000 buffers. That’s quite a bit. The last task at the end of my package is just a Row Count component. Using 2,000 buffers at 10,000 rows per buffer this package takes about 1 minute and 8 seconds. What I can do is adjust the size of the buffer that SSIS uses to process each set. Changing the value to 873,000 rows per buffer and 100 MB default buffer max size moves my performance to about 50 seconds. A boost of about 30%, and it could have been a lot more if I was using actual server resources and not my little laptop here.
The problem we face as ETL developers is that without changing these from the defaults, we’re allowing SSIS to run at a much smaller capacity than what it’s capable of. It would be like buying an Aston Martin DB7 (my favorite car ) and never moving it out of 2nd gear. Painful. To help with this, I’ve created a very simple excel spreadsheet to calculate how many rows should be put into a buffer at one time in order to maximize SSIS’ efficiency. The buffer size is customizable based on how much memory you want SSIS to use for each pass it uses through the data. The maximum amount SSIS can use at once is 100 MB, or 104857600 bytes), and the spreadsheet will take what you input, up to the maximum SSIS allowed limit. Any field highlighted in yellow is where you input your values in, and each step to the left of the value denotes either step by step instructions or what the step is doing.
The complete file is available for download here.