Introduction to LOB data types in SSIS: Part 1
The point of this article is to give a relatively simple introduction to LOB data types in SSIS in conjunction with using the Windows Performance Monitor (abbreviated: Perfmon) to monitor statistics which may help in making subsequent development decisions. To make the consumption of this material easier, I will be breaking it up between two articles. The first article will cover setting up Windows 7 Performance Monitor and a tool called a Data Collector Set, additionally there will be coverage of the measurements being monitored/recorded. Also, there will be a brief intro to LOBs and LOBs in SSIS. The second article will cover LOBs in SSIS and a simulation will be run.
The situation is that you are using LOB data Types in a SSIS control flow. Because LOB data types are larger than 8KB and can hold up to 2GB. The Buffers used in the process should be monitored to see the cost of the LOB data types which enables resource management planning or possibly alteration of the package itself.
Background on LOB data types
There are four current large data types: Varchar(max), NVarchar(Max), XML, and VarBinary(Max). Each of these data types can holds up to 2GB. There are three functional, but deprecated LOB data types; TEXT, NTEXT, and IMAGE. SQLServer2008 has included a FILESTREAM which points to files stored on a NTFS while maintaining referential integrity. The rule of thumb that is used is to store LOBs smaller than 256K in the database and to store LOBs larger than 1MB on the file system.
How SSIS Handle LOBs
In addition to the source and destination, LOBs can be brought into or out of SSIS using the import column and export column tasks respectively. SSIS uses a buffer system to handle all data processing. Because of their potential size and overhead, LOBs are assigned a separate buffer so that this buffer can be spooled to a fast disk in case of memory issues. This will come into play when interpreting Perfmon and then setting the package properties appropriately to handle memory shortages.
Below is a set of instruction to set up Performance Monitor the data from which will play a determinstic role when deciding where to delegate system resources.
Setting up performance monitor
1. Start Menu
2. Search for ‘Perfmon’ and open the result ‘perfmon.exe’
To create a Data Collector Set
1. Open Hierarchy tree for ‘Data Collector Sets’ in the left pane
2. Right click ‘User Defined’
3. Select ‘New’ > ‘Data Collector Set’
4. Type in name
5. Check ‘Create Manually(Advanced)
6. Click ‘Next’
7. Check ‘Create Data Logs’
8. Check ‘ Log Performance Counter’
9. Click ‘Next’
10. Click ‘Add’
*Next I will be adding SSIS counters to monitor the buffers
11. Select ‘SQLServer: SSIS Pipeline 10.0 in the left pane
12. Transfer all or expand node and transfer each Item individually using the ‘add>>’ button
13. Click ok when finished
14. Set Sample interval for recording data
15. Click next
16. Specify where you wanted the logs stored
17. Click Finish
Right Click the new LOB and select properties then you will be able to specify logging options. I will not go into the details of configuring this, but it is pretty straight forward. You will have directory settings; a data collector scheduler, a stop condition specification tab, and even a task after completion tab. Valuable administration tools.
To view the same counters without creating the data collector set :
1. Start
2. Search for ‘perfmon’ and open the result ‘perfmon.exe’
3. Expand hierarchy of ‘Monitoring tools’ in the left pane
4. Double click ‘performance monitor’
5. Right click on the graph
6. Select ‘add counters’
7. Select ‘SQLServer: SSIS Pipeline 10.0 in the left pane
8. Transfer all or expand node and transfer each Item individually using the ‘add>>’ button
9. Click ok when finished
10. If you wish to save resulting data right click and select ‘save image’
Counter Descriptions
BLOB bytes read- Total of all the blob bytes read.
BLOB bytes written- Total of all the blob bytes written.
BLOB files in use- The number of blob files the data engine is spooling.
Buffer memory- The amount of memory the Buffers are using, including any buffers that are written to disk. This includes the use of any buffers that the SSIS data engine created to help with data flow; such buffers are not flat or private.
Buffers in use – Indicates the number of buffers used with-in package execution.
Buffers spooled – Shows the number of buffers that needed to be spooled. This statistic is defaulted to 0 but when a buffer runs out of memory and has to use the hard disk to complete a task it populates this counter.
Flat buffer memory – The amount of memory in bytes that all flat buffers use.
Flat buffers in use – Indicates the number of Flat buffers used by tasks. Flat Buffers do not transform data. They are used by Tasks to process data inputs and out puts where rows and columns are not needed in the buffer. Flat buffers are a type of private buffer.
Private buffers in use – Indicates the number of buffers used in data transformations.
Rows read- Total rows read from all sources.
Rows written- Total rows written to all data destination.
Great Article man