Business Intelligence Blogs

View blogs by industry experts on topics such as SSAS, SSIS, SSRS, Power BI, Performance Tuning, Azure, Big Data and much more! You can also sign up to post your own business intelligence blog.

Excel Data Load Truncation Error

  • 18 July 2010
  • Author: DanMatisis
  • Number of views: 7739
  • 0 Comments

Excel Data Load Truncation Error

I came across an interesting issue while a helping a client. He was trying to load an OLE DB source from a excel spreadsheet and there was a truncation error that was being received indicating that the data was too long for the column data type. That lead to the question ‘can I change the column length and type?’ and the simple answer is ‘No’. The second question is ‘Why is this Error Occurring?’ . The answer is because of the way the Engine samples excel data and implicitly create data types. This article will be discussing what avenues will and will not work and why and how to increase your chances of getting excel to upload your data without unnecessary truncation failure.

 Now a quick aside to explain the structure of this article. This first section is more academic in the sense that I am explaining the structure of the excel system and the possible solutions you can implement. In the second section I will be going through a demonstration illustrated by screenshots showing how to get and fix the error. So if you would like understand and/or fix your problem quickly then read the first section. If you would like to see a hands on explanation of the problem where you can really grasp what is happening in SSIS then the second section is for you.

Explanation of Excel Source Data Sampling: Cause, Effect, and Resolution 

First, let’s explore why an error like this occurs and what are specifics of the situation that can lead up to such an error.  The error occurs because of the way the data engine reads and determines column lengths of a specified excel sheet. To put it simply SSIS will go into the excel sheet you wish to load look at the first 8 rows of data for each of the columns in the sheet and determine what it deems the most appropriate data type and length.  The truncation error can occur in a circumstance where the first 8 rows of an attributes data are not representative of the entire set of data in that particular attribute. Say that there was a column named description with no populated description in the first 8 rows of data and then in the 9th row there was a 700 character description. This situation would create a truncation error because excel would sample the first eight rows and assign a data type of DT_WSTR with a length of 255 because this would efficiently encapsulate the incoming data based on the what it saw in the data. Now when the 9th row of data with 700 characters enters the pipeline it can’t fit into column.

Now when this error occurs the obvious instinct is to go your excel source task’s advanced editor and change the column length of the attribute throwing the error. Easy right? Well unfortunately the same error will occur, the reason being that the error output column’s length also has to be changed as well. By navigating back to the advanced editor and into the error output of the column(s) in question we can see that the length attribute cannot be altered. This is basically a dead end and also ends all hopes of fixing the situation within SSIS. There are however a few work-a-rounds that can help in the right situation.

The first is ugly but obvious in a way. Since we know the mechanism of action involves looking at the first 8 rows of data in the excel sheet, we can then insert dummy data which is meant to represent the extremes of the data found throughout the rest of the excel sheet.  If we know that our 9th row has a 700 character description then enter at least a 700 character description in to the dummy row. Another possibility that will avoid the input of dummy data is to manipulate your excel sheet by bringing those rows that are being truncated and putting them in the first rows of the data. Obviously this will only guarantee that 8 attributes throwing the error can be taken into account.  If there were 9 columns being misrepresented by the first 8 rows and you try to move a representative column into those first 8 rows and no overlap occurs you will be limited to fixing 8 truncation errors with this method. More rows could be fixed if overlap occurs.

As you can see both of these methods can be a bit messy the first because you are adding dummy data the second because you have to go searching for rows and deleting and inserting. But there is another way that can help users but it does require the proper permissions.

The best way to increase your chances before any spreadsheet manipulation occurs would be to increase the row sampling size used by SSIS to determine the column lengths. This can be done in the registry keys. Before going into the walkthrough of how to fix the registry key ‘TypeGuessRows’ know that the key can only be set in certain ways. The key setting has a number of possible value the highest value will allow for the sampling of 16,384 rows of data. This will add time that it takes to process excel files. Depending on the type of incoming Excel data you have on a regular basis you may or may not want to leave this setting after changing it.

To change the registry setting you will need administrative rights on the workstation.

1.       You can get to the registry by typing in Regedt32 in the start menu

2.       When the registry is up follow this path to the excel registry key which is going to be altered: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\12.0\Access_Connectivity_Engine\Engines\Excel

(Note if you cannot follow the file path hit CTRL+F and search for TypeGuessRows)

(*If this does not work try path HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel)

3.       Double click ‘TypeGuessRows’ this will allow you to change the number of rows sampled. You can specify a number between 0-16. Numbers 1-16 sample of the number of rows indicated by the digit. 0 however will sample the largest number of rows which is 16384.

 

 

Demonstration

First lets create an excel sheet called Row9.xslx to use as our source, this sheet will contain 9 rows of data. In the 9th row of the Layout you can see that the description column contain far more letters then the other rows (over255 characters).

The next step I will take is to open up BIDS and create an integration services project named ‘ExcelSource’. With -in the new solution I will create a package called ‘ExcelSource.dtsx’. (*note if you are on a 64 bit machine you have to change your setting to allow 32 bit runtime, scroll to the bottom for a step by step).Now that I have a package created and configured it’s time to bring the control and data flow tasks necessary to reproduce a truncation error.

In the control flow I will drag over a ‘Data Flow’ Task and then double click it to enter the Data Flow editor. Here I will simply drag over an Excel source and an OLEDB destination (The OLE DB destination contains a database called testing which I created for the purpose of loading the excel data).

 

Next we want to configure our source to pull in ‘Row9.xslx’ and then I will configure my OLE DB destination to point to the database ‘Testing’ and have the task create a new table in that database called ‘ExcelSource’.

Now its time to run the package and see if the errors occur because of the 9th row. Look below and you will see that when the package is executed is fails on the Excel Source Tab. You can see that the errors listed in the progress tab point out some key information.  It shows that problem is with the column ‘Description’ and it shows the first row that caused the problem ‘ExcelSourceOutput(9)’. Next it shows that error is occurring because of truncation error. The truncation error occurred because by default excel creates data types by looking at the contents of the first eight rows in the excel spreadsheet. Based on the first eight rows it assigned a Unicode data type with a length of 255. This attribute cannot be changed in SSIS. The 9th row is longer than255 characters so it results in a truncation error. There are two ways to fix this. Either make the first 8 rows representative or change the number of rows that the data samples.

                                                                         

 

Making the number of rows representative can be done by manipulating the excel spread sheet in two different ways. The first is to insert a row of dummy data in the first row and change whichever columns are being truncated to represent the associated data in the rows causing the truncation error. Lets try this out by inputting record number ‘0’ before record number  ‘1’ with a comparable amount of characters In the description.

 

 Here are some notes that could help if you are trying to follow along in your environment. I have gone into management studio changed the ‘description’ column data type to NTEXT in my destination table. That table was created based on the original sampling and created the wrong data type in the destination. The query below will set the attribute data type correctly. You will also have to double click the excel source to update it.

USE testing

GO

ALTER TABLE dbo.excelsource

ALTER COLUMN [description] NTEXT

With everything in place now, executing the package will result in success. The second way to make the first eight rows representative is to cut the row in question and paste it in the first eight rows. This package will also succeeds when executed, but only guarantees that eight truncation errors can be fixed.

 

Now lets try and go about loading our data without changing out excel sheet but instead changing the property which indicates that the first eight rows in an excel sheet should determine the data types. (*I will reset the package and excel sheet to its original settings in which the first eight rows of the excel sheet will cause a truncation error)

Before we map to the excel sheet we will change the registry key. By following steps we can change the registry value:

To change the registry setting you will need administrative rights on the workstation.

1.       You can get to the registry by typing in Regedt32 in the start menu

2.       When the registry is up follow this path to the excel registry key which is going to be altered: HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Office\12.0\Access_Connectivity_Engine\Engines\Excel (Note if you cannot follow the file path hit CTRL+F and search for TypeGuessRows)

(*If this does not work try path HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Jet\4.0\Engines\Excel)

 

3.       Double click ‘TypeGuessRows’ this will allow you to change the number of rows sampled. You can specify a number between 0-16. Numbers 1-16 sample of the number of rows indicated by the digit. 0 however will sample the largest sample which is 16384.

Here I will change the registry value to ‘0’ and then point my excel source to the row9.xslx.  When the package is executed it will succeed now.

That is the end of the demonstration part of the article and the end of the article, if you have any questions contact by posting directly or emailing me directly at DMatisis@PragmaticWorks.com

Enabling 32-bit mode steps

1.       In the solution explorer right click the solution name, in this case ‘ExcelSource’

2.       Select properties

3.       In the left pane click on the ‘Debugging’ tab

4.       Change the ‘Run64BitRuntime’ to False

5.       Click apply

 

Print
Categories: Miscellaneous
Tags:
Rate this article:
No rating

DanMatisisDanMatisis

Other posts by DanMatisis

Please login or register to post comments.