One of the challenges when working with flat files in SSIS comes when record types are mixed in a single file. This is a common scenario in health care where often times header and detail records and mixed and are only tied together by the fact that they occur sequentially in the flat file. Unfortunately SSIS has no built in process to help a developer overcome this hurdle. Luckily with a little bit of code and the script transformation we can parse and process these types of file with ease.
Lets look at a sample fixed width file. Our sample data file contains three distinct record types which are denoted by the first three characters of each line (001, 002 & 003). The first record type has the patient last name, first name and date of birth. The second type contains the patients billing address and the third record type has procedure codes which were performed on the patient. Notice that there is not a identifier that ties the records together beyond the sequence that they occur within the file. Also note that the frequency in which the records occur can vary. In our short example Thomas Tester had three procedures performed while Martha Jones had five. So let's jump straight into the solution....
1. In your SSIS project, add a Data Flow Task.2. In the data flow task, add a script transformation and select 'Source' when prompted.3. For this solution we use three outputs, one for each record type. Notice that for each output the columns and data types have been configured.
4. The CreateNewOutputRows is where the magic really happens. The code with detailed comments is below:
public override void CreateNewOutputRows() { //Make sure the file exists if (File.Exists(Variables.FilePath)) { //Open a stream reader to read the data file using (StreamReader fileStream = new StreamReader(Variables.FilePath)) { Guid correlationID = Guid.Empty; //Used to correlate records in a sequential file //Read the data file while (!fileStream.EndOfStream) { //Read a single record string record = fileStream.ReadLine(); //Guard against blank lines if (!string.IsNullOrEmpty(record)) { //The first three characters denote a record type string recordType = record.Substring(0, 3); switch (recordType) { case ("001"): //Patient Header record //Get a new correlation id correlationID = Guid.NewGuid(); //Add a row to the record type 1 buffer Type1OutputBuffer.AddRow(); Type1OutputBuffer.CorrelationID = correlationID; //Last Name - 10 characters Type1OutputBuffer.LastName = record.Substring(3, 10).Trim(); //First Name - 10 characters Type1OutputBuffer.FirstName = record.Substring(13, 10).Trim(); //DOB - 10 characters Type1OutputBuffer.DOB = Convert.ToDateTime(record.Substring(23, 10)); break; case ("002"): //Address record Type2OutputBuffer.AddRow(); Type2OutputBuffer.CorrelationID = correlationID; //Street - 30 characters Type2OutputBuffer.Address = record.Substring(3, 30).Trim(); //City - 20 characters Type2OutputBuffer.City = record.Substring(33, 20).Trim(); //State - 2 characters Type2OutputBuffer.State = record.Substring(53, 2); //Zip Code - 5 characters Type2OutputBuffer.Zip = record.Substring(55, 5); break; case ("003"): //Patient Procedure record Type3OutputBuffer.AddRow(); Type3OutputBuffer.CorrelationID = correlationID; //Procedure Code Type3OutputBuffer.ProcedureCode = record.Substring(3, 6).Trim(); break; default: //Ignore record types that are not matched break; } } } } Type1OutputBuffer.SetEndOfRowset(); Type2OutputBuffer.SetEndOfRowset(); Type3OutputBuffer.SetEndOfRowset(); } }
6. The finished data flow looks like this:
As you can see implementing a solution to handle these files is straight-forward. This solution is flexible and can be adapted to handle any type of flat which contains mixed records but altering the line parsing logic.
Best of luck!!
Chris
Download the Demo Package HERE
Demo Package Notes:1. Place the data.txt file in the c:\temp\ directory or modify the 'FilePath' variable to point to the data.txt file on your machine.2. If the c:\temp\ directory does not exist, update the file connection managers to use a directory that does exists.
The application has experienced an error - we apologize for the inconvenience.
Our technical team was notified and is looking into the issue now.
Please try the following: