Business Intelligence Blogs

View blogs by industry experts on topics such as SSAS, SSIS, SSRS, Power BI, Performance Tuning, Azure, Big Data and much more! You can also sign up to post your own business intelligence blog.

Handling Complex Text Files in SSIS

  • 7 December 2011
  • Author: cprice1979
  • Number of views: 7709
  • 0 Comments

One of the challenges when working with flat files in SSIS comes when record types are mixed in a single file. This is a common scenario in health care where often times header and detail records and mixed and are only tied together by the fact that they occur sequentially in the flat file. Unfortunately SSIS has no built in process to help a developer overcome this hurdle. Luckily with a little bit of code and the script transformation we can parse and process these types of file with ease.

Lets look at a sample fixed width file. Our sample data file contains three distinct record types which are denoted by the first three characters of each line (001, 002 & 003). The first record type has the patient last name, first name and date of birth. The second type contains the patients billing address and the third record type has procedure codes which were performed on the patient. Notice that there is not a identifier that ties the records together beyond the sequence that they occur within the file. Also note that the frequency in which the records occur can vary. In our short example Thomas Tester had three procedures performed while Martha Jones had five. So let's jump straight into the solution....

Sample Data FileSample Data File

1. In your SSIS project, add a Data Flow Task.
2. In the data flow task, add a script transformation and select 'Source' when prompted.
3. For this solution we use three outputs, one for each record type. Notice that for each output the columns and data types have been configured.

Script Transformation Configuration
4. The CreateNewOutputRows is where the magic really happens. The code with detailed comments is below:

 

    public override void CreateNewOutputRows()
    {
        //Make sure the file exists
        if (File.Exists(Variables.FilePath))
        {
            //Open a stream reader to read the data file
            using (StreamReader fileStream = new StreamReader(Variables.FilePath))
            {
                Guid correlationID = Guid.Empty; //Used to correlate records in a sequential file

                //Read the data file
                while (!fileStream.EndOfStream)
                {
                    //Read a single record
                    string record = fileStream.ReadLine();

                    //Guard against blank lines
                    if (!string.IsNullOrEmpty(record))
                    {
                        //The first three characters denote a record type
                        string recordType = record.Substring(0, 3);
                        
                        switch (recordType)
                        {
                            case ("001"): //Patient Header record
                                //Get a new correlation id
                                correlationID = Guid.NewGuid();

                                //Add a row to the record type 1 buffer
                                Type1OutputBuffer.AddRow();
                                Type1OutputBuffer.CorrelationID = correlationID;

                                //Last Name - 10 characters
                                Type1OutputBuffer.LastName = record.Substring(3, 10).Trim();

                                //First Name - 10 characters
                                Type1OutputBuffer.FirstName = record.Substring(13, 10).Trim();

                                //DOB - 10 characters
                                Type1OutputBuffer.DOB = Convert.ToDateTime(record.Substring(23, 10));
                                break;
                            case ("002"): //Address record

                                Type2OutputBuffer.AddRow();
                                Type2OutputBuffer.CorrelationID = correlationID;

                                //Street - 30 characters
                                Type2OutputBuffer.Address = record.Substring(3, 30).Trim();

                                //City - 20 characters
                                Type2OutputBuffer.City = record.Substring(33, 20).Trim();

                                //State - 2 characters
                                Type2OutputBuffer.State = record.Substring(53, 2);

                                //Zip Code - 5 characters
                                Type2OutputBuffer.Zip = record.Substring(55, 5);
                                break;
                            case ("003"): //Patient Procedure record

                                Type3OutputBuffer.AddRow();
                                Type3OutputBuffer.CorrelationID = correlationID;
                                //Procedure Code 
                                Type3OutputBuffer.ProcedureCode = record.Substring(3, 6).Trim();
                                break;
                            default:
                                //Ignore record types that are not matched
                                break;
                        }
                    }
                }
            }

            Type1OutputBuffer.SetEndOfRowset();
            Type2OutputBuffer.SetEndOfRowset();
            Type3OutputBuffer.SetEndOfRowset();
        }
    }


5. Do something with the outputs. For this demo we are simplying writing the outputs to a csv text file.

 

6. The finished data flow looks like this:

Finished Data Flow

As you can see implementing a solution to handle these files is straight-forward. This solution is flexible and can be adapted to handle any type of flat which contains mixed records but altering the line parsing logic.

Best of luck!!

Chris

Download the Demo Package HERE

Demo Package Notes:
1. Place the data.txt file in the c:\temp\ directory or modify the 'FilePath' variable to point to the data.txt file on your machine.
2. If the c:\temp\ directory does not exist, update the file connection managers to use a directory that does exists.

Print
Categories: Miscellaneous
Tags:
Rate this article:
No rating

cprice1979cprice1979

Other posts by cprice1979

Please login or register to post comments.