Business Intelligence Blogs

View blogs by industry experts on topics such as SSAS, SSIS, SSRS, Power BI, Performance Tuning, Azure, Big Data and much more! You can also sign up to post your own business intelligence blog.

SQL Server Integration Services SSIS Performance Best Practices - Part 2

  • 26 September 2013
  • Author: abhi_MSBI
  • Number of views: 14290
  • 0 Comments

2. Avoid SELECT *

The Data Flow Task (DFT) of SSIS uses a buffer (a chunk of memory) oriented architecture for data transfer and transformation. When data travels from the source to the destination, the data first comes into the buffer, required transformations are done in the buffer itself and then written to the destination.

The size of the buffer is dependant on several factors, one of them is the estimated row size. The estimated row size is determined by summing the maximum size of all the columns in the row. So the more columns in a row means less number of rows in a buffer and with more buffer requirements the result is performance degradation. Hence it is recommended to select only those columns which are required at destination.

Even if you need all the columns from the source, you should use the column name specifically in the SELECT statement otherwise it takes another round for the source to gather meta-data about the columns when you are using SELECT *.

If you pull columns which are not required at destination (or for which no mapping exists) SSIS will emit warnings like this.

[SSIS.Pipeline] Warning: The output column "SalariedFlag" (64) on output "OLE DB Source Output" (11) and component "OLE DB Source" (1) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.
[SSIS.Pipeline] Warning: The output column "CurrentFlag" (73) on output "OLE DB Source Output" (11) and component "OLE DB Source" (1) is not subsequently used in the Data Flow task. Removing this unused output column can increase Data Flow task performance.

Beware when you are using "Table or view" or "Table name or view name from variable" data access mode in OLEDB source. It behaves like SELECT * and pulls all the columns, use this access mode only if you need all the columns of the table or view from the source to the destination.

Tip : Try to fit as many rows into the buffer which will eventually reduce the number of buffers passing through the dataflow pipeline engine and improve performance.

 

Upcoming SQL Server Integration Services (SSIS) - Best Practices - Part 3 tip for more best practices around SSIS.

Awaitng you valuable comments..

Print
Categories: Analysis Services
Tags:
Rate this article:
3.0

abhi_MSBIabhi_MSBI

Other posts by abhi_MSBI

Please login or register to post comments.