Using SSIS to load a flat file into a database is a common use of the tool. This works great in SSIS and is very fast due to the dataflow doing batch updates. This is easy to set up when the flat file has no headers and footers. I am not referring to the column headers, those can be skipped. I am referring to the headers that are not a part of the data. Below is an example of data from a comma delimited flat file. There are several columns of data in the flat file separated by commas. The headers and footers that appear around the data make it difficult to get to the data.
One of the options you have in this situation is to use the error output from the source in the data flow. The problem with this approach is data could still be written from the header if the header does not cause an error.
Here is a data flow example of the flat file loading into the table with the error rows being sent to another file that can be checked later manually for data.
After running the dataflow with the above file we can see that the headers were moved to the bad file, but the first data row was moved also. Here is the bad rows file:
The rest of the rows were written to the table.
If the header is very small then you may get some of the header in the database. Here is an example file with a small header.
Loading this using the above data flow causes no rows to be written to the bad rows file and places the header into the table as seen below. This occurred because the header and the first row in the data did not exceed the column width on the table and therefore was treated as the data.
You can see that this bad row option does not always work for loading data. You may end up with missing rows or junk data. We need away to remove the header rows. Fortunately this can be done with a script task in SSIS.
The first thing we need to do is create some package variables to be used by our script task. This way if the file format changes we can update the variables and the script task will still run successfully. This script is made to work with delimited files only. If the file is fixed width or ragged right, this approach will not work.
intDelimCount = number of delimiters that should be on each line that contains datastrAppend = Appended to the new file name created from the script taskstrDelimiter = delimiter used in the flat filestrDestFolder = destination folder for the new files created by the scrip taskstrFileExt = extension of the flat filestrFileName = name of the flat filestrSourceFolder = folder where the flat file exist
Now for the script task, this script task will open the file and parse through the rows looking for any rows that have the right amount of delimiters. When it finds a row that has the right amount of delimiters it writes this row to a new file in the destination folder. It adds the append variable to the file name also. If the row does not contain the correct number of variables it is skipped over. Once the script task is complete you will have the original flat file unchanged, and you will have a new file with only the data rows. Below is the code in VB, you can convert it to C# with websites like this one: http://www.developerfusion.com/tools/convert/csharp-to-vb/.
Public Sub Main()
'save the package variables as script variables
Dim strDestFolder As String = Dts.Variables("strDestFolder").Value
Dim strAppend As String = Dts.Variables("strAppend").Value
Dim delim As String = Dts.Variables("strDelimiter").Value
Dim intDelimCount As Integer = Dts.Variables("intDelimCount").Value
Dim strFileName As String = Dts.Variables("strFileName").Value
Dim strSourceFolder As String = Dts.Variables("strSourceFolder").Value
Dim strFileExt As String = Dts.Variables("strFileExt").Value
'combine the variables to get the file names
Dim inputFileName As String = strSourceFolder + strFileName + strFileExt
Dim outputFileName As String = strDestFolder + strFileName + strAppend + strFileExt
'intitilize the line count
Dim intLineCount As Integer = 0
' try to write the proper lines to a file
Dim inputFile As New StreamReader(inputFileName) ' create a streamreader to read the flatfile
FileOpen(1, outputFileName, OpenMode.Output, OpenAccess.Write) 'create and open the new file to write the data into
'create variables for the reading and writing loop
Dim strFullLine As String = Nothing
Dim strLine As String = Nothing
Dim intDelimPos As Integer = 0
Dim intCount As Integer = 0
While Not (inputFile.EndOfStream) ' read until we reach the end of the flat file
strFullLine = inputFile.ReadLine ' read one line and save so we can write it to the new file later
strLine = strFullLine ' save the line in another variable that will be broken up to count delimiters
intDelimPos = strFullLine.IndexOf(delim) ' get first delim position
intCount = 0 'reset the count of delimeters
While intDelimPos <> -1 'loop until there are no delimters found
strLine = strLine.Substring(intDelimPos + 1) 'drop everything before the current delimeter
intDelimPos = strLine.IndexOf(delim) 'get position of the next delimiter
intCount += 1 ' increment count of the number of delimiters found
If intCount = intDelimCount Then 'if the right amount of delimeters were counted then write the line to the fixed file
PrintLine(1, strFullLine) ' write the data line to the new file
intLineCount += 1 'increment the line count showing the number of lines written
If intLineCount > 0 Then ' if any rows were written fire information so we can see this in the progress tab
Dts.Events.FireInformation(0, "subComponent", CStr(intLineCount) + " Lines Written to:" + outputFileName, String.Empty, 0, False)
'close the two files
Catch exDTS As DtsException 'catch any errors and fire the error event on the package
Dts.Events.FireError(CInt(exDTS.ErrorCode), Dts.Variables("System::TaskName").Value.ToString, exDTS.Message.ToString, String.Empty, 0)
Catch ex As Exception 'catch any script errors and fire the error event on the package
Dts.Events.FireError(0, Dts.Variables("System::TaskName").Value.ToString, ex.Message.ToString, String.Empty, 0)
Dts.TaskResult = ScriptResults.Success
After running this script on the following file:
Here is the new file created by the script task:
The comments in the code should explain what is taking place line by line. This script task can be placed in a For Each Loop and run on a series of files. As long as they all have the same number and type of delimiters.
This is not a perfect system. If there is a header row that contains the right amount of delimiters then it will be written to the new file. This might cause an error during the data flow, or the header may get written to the table. I think a header having the exact same number of delimiters as the data would be rare. If the flat file has column headers, they will be copied to the new file. You will need to check the header rows option in the connection manager for the flat file. Also if the delimiter appears in the data it will not have the correct delimiter count.
You can dowload a copy of this package here.
If you have any questions about this article, or if you have another flat file situation that is strange, let me know.
I agree with the others, this is a great article.
My challange is that my files are EBCDIC (37) and the row delimiters varies a lot across the different files, however I have identified them. I have 10 years of data to parse... around 10TB...
thanks in advance
10 Tb of data is too much to parse through with the script task I wrote. It would cause memory issues. You would have to break the file up into manageable chunks.
If you are converting the data to ASCII check out my article on converting EBCDIC to ASCII.
Mike, any way to do this with a "space" as a delimiter? I've tried " " and ' ' and even left the variable blank. Non worked. My data is separated with a space. Too many records to insert a comma due to the headers.
For a space delimeter just try putting a space in the variable. Or just set the value inside the script task like so:
Dim delim As String = " "
Setting strDelim as variable to have a value of " " - creates BigHeader_fixed as an empty file.
Setting Dim delim As String = " " - has the same results - an empty BigHeader_fixed file.
Any other ideas?
I just tested it with a space in the package variable and it worked perfectly. I am not sure why yours is not working. Please post this question on the forum with more details and maybe even a screen shot so I can better answer the question.
Mike, I added my steps to the forum, but it looks messy. Not sure how to make the code to appear normal and not spread all over the page. I worked it out to this line: While intDelimPos <> -1 with a msgbox, intDelimPos is always -1. strFullLine pulls the 1st line in the txt file. intDelimPos = strFullLine.IndexOf(delim) = -1. delim is set as " ".
Fantastic Article! So many financial systems spit out these .LIS flat files and call them "reports". The look as though they are formatted for the old "Green Bar" printers of yester-years. The problem is there are multiple headers that group the data into pages with actual page numbers AND (in some cases) a break in the pattern to contain a "Summary page" of the information in the previous sections. This article is invaluable! Thank you.
I have this problem with a ragged right, crlf header row delimiter, no text qualifyer. No column headers.
I want to extract just the detail (more than one detail) from flat file to table using ssis.
Thanks Mike, great article!!
My challenge is that each inbound file has a File Header, Batch Header, Detail Record 1, Detail Record 2, Batch Footer, and File Footer.
How would I handle this situation as the two detail records should merge into a single row in the table?