When you want to transform data in a data flow task in SSIS, the derived column transform performs this very fast due to the batch method. But if you have hundreds of columns that need the same transform performed on them, it can be very time consuming to create each of these derived column expressions. You can use a script task to loop through the columns in a data flow and perform the same task on every input column you select in the script task. Keep in mind, this will degrade performance, but it will make the development faster.
Here are the first few rows of the million row table I am going to use to demonstrate the functions. Let’s say we need get just the first 3 characters on each of these fields.
First I will show how to complete this with a derived column. It is just a simple substring command on each column. This is a small example. Imagine if you had hundreds of columns that need this transform. This would be a very time consuming process. This derived column will work and perform very fast.
With a script task you will need to import the System.Reflection namespace. This will allow you to refer the columns in the data flow. Then we will create a For Each Loop to loop through each column. The first thing we need to do is place a check next to each column we want the script task to substring and set them to read write so we can update them.
Here is the code in the script task that will perform the substring function needed.
Dim column As IDTSInputColumn100
Dim rowType As Type = Row.GetType()
Dim columnValue As PropertyInfo
For Each column In Me.ComponentMetaData.InputCollection(0).InputColumnCollection
columnValue = rowType.GetProperty(column.Name)
Dim strCol As String = columnValue.GetValue(Row, Nothing).ToString()
strCol = strCol.Substring(0, 3)
columnValue.SetValue(Row, strCol, Nothing)
Both of the above transforms will give you the below output.
The derived column performed this operation on one million rows in 6 seconds. The Script task took 406 seconds, over 6 minutes, on the same one million rows. This is a massive performance loss. I would not suggest using the script task method due to the performance loss. Although if you know the number of rows will always be low and the table is very wide and has a lot of columns that you need to perform the same function, then it can be used with no scalability.