One of the challenges when working with flat files in SSIS comes when record types are mixed in a single file. This is a common scenario in health care where often times header and detail records and mixed and are only tied together by the fact that they occur sequentially in the flat file. Unfortunately SSIS has no built in process to help a developer overcome this hurdle. Luckily with a little bit of code and the script transformation we can parse and process these types of file with ease.
Lets look at a sample fixed width file. Our sample data file contains three distinct record types which are denoted by the first three characters of each line (001, 002 & 003). The first record type has the patient last name, first name and date of birth. The second type contains the patients billing address and the third record type has procedure codes which were performed on the patient. Notice that there is not a identifier that ties the records together beyond the sequence that they occur within the file. Also note that the frequency in which the records occur can vary. In our short example Thomas Tester had three procedures performed while Martha Jones had five. So let's jump straight into the solution....
1. In your SSIS project, add a Data Flow Task.
2. In the data flow task, add a script transformation and select 'Source' when prompted.
3. For this solution we use three outputs, one for each record type. Notice that for each output the columns and data types have been configured.
4. The CreateNewOutputRows is where the magic really happens. The code with detailed comments is below:
public override void CreateNewOutputRows()
//Make sure the file exists
//Open a stream reader to read the data file
using (StreamReader fileStream = new StreamReader(Variables.FilePath))
Guid correlationID = Guid.Empty; //Used to correlate records in a sequential file
//Read the data file
//Read a single record
string record = fileStream.ReadLine();
//Guard against blank lines
//The first three characters denote a record type
string recordType = record.Substring(0, 3);
case ("001"): //Patient Header record
//Get a new correlation id
correlationID = Guid.NewGuid();
//Add a row to the record type 1 buffer
Type1OutputBuffer.CorrelationID = correlationID;
//Last Name - 10 characters
Type1OutputBuffer.LastName = record.Substring(3, 10).Trim();
//First Name - 10 characters
Type1OutputBuffer.FirstName = record.Substring(13, 10).Trim();
//DOB - 10 characters
Type1OutputBuffer.DOB = Convert.ToDateTime(record.Substring(23, 10));
case ("002"): //Address record
Type2OutputBuffer.CorrelationID = correlationID;
//Street - 30 characters
Type2OutputBuffer.Address = record.Substring(3, 30).Trim();
//City - 20 characters
Type2OutputBuffer.City = record.Substring(33, 20).Trim();
//State - 2 characters
Type2OutputBuffer.State = record.Substring(53, 2);
//Zip Code - 5 characters
Type2OutputBuffer.Zip = record.Substring(55, 5);
case ("003"): //Patient Procedure record
Type3OutputBuffer.CorrelationID = correlationID;
Type3OutputBuffer.ProcedureCode = record.Substring(3, 6).Trim();
//Ignore record types that are not matched
5. Do something with the outputs. For this demo we are simplying writing the outputs to a csv text file.
6. The finished data flow looks like this:
As you can see implementing a solution to handle these files is straight-forward. This solution is flexible and can be adapted to handle any type of flat which contains mixed records but altering the line parsing logic.
Best of luck!!
Download the Demo Package HERE
Demo Package Notes:
1. Place the data.txt file in the c:\temp\ directory or modify the 'FilePath' variable to point to the data.txt file on your machine.
2. If the c:\temp\ directory does not exist, update the file connection managers to use a directory that does exists.