Handling Headers in Flat Files With SSIS

Who is online?  0 guests and 0 members
Home  »  Articles  »  Handling Headers in Flat Files With SSIS

Handling Headers in Flat Files With SSIS

change text size: A A A
Published: 4/2/2010 by  MikeDavis  - Views:  [6916]  

Using SSIS to load a flat file into a database is a common use of the tool. This works great in SSIS and is very fast due to the dataflow doing batch updates. This is easy to set up when the flat file has no headers and footers. I am not referring to the column headers, those can be skipped. I am referring to the headers that are not a part of the data. Below is an example of data from a comma delimited flat file. There are several columns of data in the flat file separated by commas. The headers and footers that appear around the data make it difficult to get to the data.

Handling Headers in Flat Files With SSIS 

One of the options you have in this situation is to use the error output from the source in the data flow. The problem with this approach is data could still be written from the header if the header does not cause an error.

Here is a data flow example of the flat file loading into the table with the error rows being sent to another file that can be checked later manually for data.

Handling Headers in Flat Files With SSIS

After running the dataflow with the above file we can see that the headers were moved to the bad file, but the first data row was moved also. Here is the bad rows file:

Handling Headers in Flat Files With SSIS

The rest of the rows were written to the table.

Handling Headers in Flat Files With SSIS

If the header is very small then you may get some of the header in the database. Here is an example file with a small header.

Handling Headers in Flat Files With SSIS

Loading this using the above data flow causes no rows to be written to the bad rows file and places the header into the table as seen below. This occurred because the header and the first row in the data did not exceed the column width on the table and therefore was treated as the data.

Handling Headers in Flat Files With SSIS

You can see that this bad row option does not always work for loading data. You may end up with missing rows or junk data. We need away to remove the header rows. Fortunately this can be done with a script task in SSIS.

The first thing we need to do is create some package variables to be used by our script task. This way if the file format changes we can update the variables and the script task will still run successfully. This script is made to work with delimited files only. If the file is fixed width or ragged right, this approach will not work.

intDelimCount = number of delimiters that should be on each line that contains data
strAppend = Appended to the new file name created from the script task
strDelimiter = delimiter used in the flat file
strDestFolder = destination folder for the new files created by the scrip task
strFileExt = extension of the flat file
strFileName = name of the flat file
strSourceFolder = folder where the flat file exist

Handling Headers in Flat Files With SSIS

Now for the script task, this script task will open the file and parse through the rows looking for any rows that have the right amount of delimiters. When it finds a row that has the right amount of delimiters it writes this row to a new file in the destination folder. It adds the append variable to the file name also. If the row does not contain the correct number of variables it is skipped over. Once the script task is complete you will have the original flat file unchanged, and you will have a new file with only the data rows. Below is the code in VB, you can convert it to C# with websites like this one: http://www.developerfusion.com/tools/convert/csharp-to-vb/.

 

    Public Sub Main()

        'save the package variables as script variables
        Dim strDestFolder As String = Dts.Variables("strDestFolder").Value
        Dim strAppend As String = Dts.Variables("strAppend").Value
        Dim delim As String = Dts.Variables("strDelimiter").Value
        Dim intDelimCount As Integer = Dts.Variables("intDelimCount").Value
        Dim strFileName As String = Dts.Variables("strFileName").Value
        Dim strSourceFolder As String = Dts.Variables("strSourceFolder").Value
        Dim strFileExt As String = Dts.Variables("strFileExt").Value

        'combine the variables to get the file names
        Dim inputFileName As String = strSourceFolder + strFileName + strFileExt
        Dim outputFileName As String = strDestFolder + strFileName + strAppend + strFileExt

        'intitilize the line count
        Dim intLineCount As Integer = 0

        ' try to write the proper lines to a file
        Try

            Dim inputFile As New StreamReader(inputFileName) '  create a streamreader to read the flatfile
            FileOpen(1, outputFileName, OpenMode.Output, OpenAccess.Write) 'create and open the new file to write the data into

            'create variables for the reading and writing loop
            Dim strFullLine As String = Nothing
            Dim strLine As String = Nothing
            Dim intDelimPos As Integer = 0
            Dim intCount As Integer = 0


            While Not (inputFile.EndOfStream) ' read until we reach the end of the flat file

                strFullLine = inputFile.ReadLine ' read one line and save so we can write it to the new file later

                strLine = strFullLine ' save the line in another variable that will be broken up to count delimiters

                intDelimPos = strFullLine.IndexOf(delim) ' get first delim position

                intCount = 0 'reset the count of delimeters

                While intDelimPos <> -1 'loop until there are no delimters found

                    strLine = strLine.Substring(intDelimPos + 1) 'drop everything before the current delimeter

                    intDelimPos = strLine.IndexOf(delim) 'get position of the next delimiter

                    intCount += 1 ' increment count of the number of delimiters found

                End While

                If intCount = intDelimCount Then 'if the right amount of delimeters were counted then write the line to the fixed file

                    PrintLine(1, strFullLine) ' write the data line to the new file

                    intLineCount += 1 'increment the line count showing the number of lines written

                End If

            End While

            If intLineCount > 0 Then ' if any rows were written fire information so we can see this in the progress tab

                Dts.Events.FireInformation(0, "subComponent", CStr(intLineCount) + " Lines Written to:" + outputFileName, String.Empty, 0, False)

            End If

            'close the two files
            FileClose(1)
            inputFile.Close()


        Catch exDTS As DtsException 'catch any errors and fire the error event on the package

            Dts.Events.FireError(CInt(exDTS.ErrorCode), Dts.Variables("System::TaskName").Value.ToString, exDTS.Message.ToString, String.Empty, 0)

        Catch ex As Exception 'catch any script errors and fire the error event on the package

            Dts.Events.FireError(0, Dts.Variables("System::TaskName").Value.ToString, ex.Message.ToString, String.Empty, 0)

        End Try

        Dts.TaskResult = ScriptResults.Success

    End Sub




 

After running this script on the following file:

Handling Headers in Flat Files With SSIS

 

Here is the new file created by the script task:

Handling Headers in Flat Files With SSIS

 

The comments in the code should explain what is taking place line by line. This script task can be placed in a For Each Loop and run on a series of files. As long as they all have the same number and type of delimiters.

This is not a perfect system. If there is a header row that contains the right amount of delimiters then it will be written to the new file. This might cause an error during the data flow, or the header may get written to the table. I think a header having the exact same number of delimiters as the data would be rare. If the flat file has column headers, they will be copied to the new file. You will need to check the header rows option in the connection manager for the flat file. Also if the delimiter appears in the data it will not have the correct delimiter count.

You can dowload a copy of this package here.

If you have any questions about this article, or if you have another flat file situation that is strange, let me know.

 

 
0
/5
Avg: 0/5: (0 votes)

Comments (19)

satish84
satish84 said:
hi thanks for the great article... I have a scenario.. where I have to read data from mutiple flat files to single file... the problem is every time i read a new file the headers are getting added to the destination. The thing is every source file is having the header columns... help needed... regards satish
4/6/2010
 · 
 
by
mikedavis
mikedavis said:
Does the script I wrote above not work? If so what happens when you try. You may need to tweak it a little to get it to work on your files.
4/6/2010
 · 
 
by
Daniel
Daniel said:
Nice article for a common issue. Also, thanks for the extremely well commented code. I think if you know the data, as in your examples above, it would be easy to add a few more checks. You could test for the last character being numeric. You could add a few more variables recording the maximum length of some of the shorter columns, and then test for the length of the data bound for those columns. Commas in data, specifically numeric data (here in the U.S.) such as 1,234.00 are also a common problem. Hey, maybe there is another article subject for you.
4/27/2010
 · 
 
by
mostany
mostany said:

I agree with the others, this is a great article.

My challange is that my files are EBCDIC (37) and the row delimiters varies a lot across the different files, however I have identified them. I have 10 years of data to parse... around 10TB...

Any suggestion?

thanks in advance

7/14/2010
 · 
 
by
mikedavis
mikedavis said:

10 Tb of data is too much to parse through with the script task I wrote. It would cause memory issues. You would have to break the file up into manageable chunks.

If you are converting the data to ASCII check out my article on converting EBCDIC to ASCII.

http://bidn.com/articles/integration-services/118/using-ssis-to-convert-ebcdic-to-ascii

 

7/14/2010
 · 
 
by
JohnG
JohnG said:

Mike,  any way to do this with a "space" as a delimiter?  I've tried " " and ' ' and even left the variable blank.  Non worked.  My data is separated with a space.  Too many records to insert a comma due to the headers.

7/26/2010
 · 
 
by
mikedavis
mikedavis said:

For a space delimeter just try putting a space in the variable. Or just set the value inside the script task like so:

 Dim delim As String = " "

 

7/27/2010
 · 
 
by
JohnG
JohnG said:

Mike,

Setting strDelim as variable to have a value of " " - creates BigHeader_fixed as an empty file.

Setting Dim delim As String = " " - has the same results - an empty BigHeader_fixed file.

Any other ideas?

7/27/2010
 · 
 
by
mikedavis
mikedavis said:

I just tested it with a space in the package variable and it worked perfectly. I am not sure why yours is not working. Please post this question on the forum with more details and maybe even a screen shot so I can better answer the question. 

7/27/2010
 · 
 
by
JohnG
JohnG said:

Mike,  I added my steps to the forum, but it looks messy.  Not sure how to make the code to appear normal and not spread all over the page.  I worked it out to this line: While intDelimPos <> -1 with a msgbox, intDelimPos is always -1.   strFullLine pulls the 1st line in the txt file.  intDelimPos = strFullLine.IndexOf(delim) = -1.  delim is set as " ".

7/27/2010
 · 
 
by
Reichglass
Reichglass said:

Fantastic Article! So many financial systems spit out these .LIS flat files and call them "reports". The look as though they are formatted for the old "Green Bar" printers of yester-years. The problem is there are multiple headers that group the data into pages with actual page numbers AND (in some cases) a break in the pattern to contain a "Summary page" of the information in the previous sections. This article is invaluable! Thank you.

10/5/2010
 · 
 
by
Reichglass
Reichglass said:

Fantastic Article! So many financial systems spit out these .LIS flat files and call them "reports". The look as though they are formatted for the old "Green Bar" printers of yester-years. The problem is there are multiple headers that group the data into pages with actual page numbers AND (in some cases) a break in the pattern to contain a "Summary page" of the information in the previous sections. This article is invaluable! Thank you.

10/5/2010
 · 
 
by
delbar
delbar said:

I have this problem with a ragged right, crlf header row delimiter, no text qualifyer.   No column headers.

Header

Detail

footer

header

detail

footer

I want to extract just the detail (more than one detail) from flat file to table using ssis.

extremely urgent.

thanks,

delbar

2/15/2011
 · 
 
by
bc60
bc60 said:

Thanks Mike, great article!!

My challenge is that each inbound file has a File Header, Batch Header, Detail Record 1, Detail Record 2, Batch Footer, and File Footer.

How would I handle this situation as the two detail records should merge into a single row in the table?

4/26/2011
 · 
 
by
Karthick
Karthick said:

Excellent Article.

7/12/2011
 · 
 
by
Shark
Shark said:
Mike, I have a text file that has multiple sections, separated by a header and a footer. The header for each section contains an 8 in the 1st column. The footer starts with a 9. I want to be able to grab the contents between the header and the footer and put into a table or text file. Here is the kicker... the second column in the header/footer contains a different number for each "section". So i only want the contents where the header is 8, 15, and another section where the header is 8, 21. Does that make any sense? So there are two sections of code that i want from this huge text file that is broken out into many sections.
10/1/2012
 · 
 
by
MikeDavis
MikeDavis said:
Shark, You will need some sort of logical way to determine the numbers at the front of the line. If the numbers are not logical then you will need another method like, checking the line length. Post this questions to the forums with some example lines so you can get a better answer.
10/1/2012
 · 
 
by
Al_Gore_rythem
Thank you Mike. Here is an example of a header, which are all identical except for the 4th column (i.e. 01) 8 100008 08292012 01 4.0 0200 200 1 CHAVEZOIL Here is a footer record 9 100008 08292012 01 4.0 0200 200 1 CHAVEZOIL
10/8/2012
 · 
 
by
Al_Gore_rythem
im shark by the way.
10/8/2012
 · 
 
by
  • Name:*
  • Email:*
  • Website:
Type the characters you see in the image: *

Most Recent Articles