posted 4/14/2012 by RussLoski - Views: [2811]
Last week I blogged about the exciting changes to the XML Task in SSIS 2012 (http://www.bidn.com/blogs/RussLoski/ssas/2658/ssis-2012-xml-task-namespaces-in-xpath). Microsoft has added the ability to reference XML Namespaces when creating XPath. That was one annoying bug fixed, have they fixed other SSIS XML bugs? There are two "features" of the XML Source that have bothered me for a while: handling root node attributes and handling multiple XML Namespaces.
The problem with root nodes is illustrated by the following simple XML:
<?xml version="1.0"?> <Sales SubmissionDate="2011-10-28" NumberOfOrders="2"> <SalesOrder OrderDate="2005-07-01T00:00:00" SalesOrderNumber="SO43700" TaxAmt="55.9279" SubTotal="699.0982" Freight="17.4775"> <Detail OrderQty="1" UnitPrice="699.0982" LineTotal="699.098200"/> </SalesOrder> <SalesOrder OrderDate="2007-12-24T00:00:00" SalesOrderNumber="SO60686" TaxAmt="200.9856" SubTotal="2512.3200" Freight="62.8080"> <Detail OrderQty="1" UnitPrice="2443.3500" LineTotal="2443.350000"/> <Detail OrderQty="1" UnitPrice="4.9900" LineTotal="4.990000"/> <Detail OrderQty="1" UnitPrice="8.9900" LineTotal="8.990000"/> <Detail OrderQty="1" UnitPrice="54.9900" LineTotal="54.990000" /> </SalesOrder> </Sales>
If you use the XML Source, you can find outputs related to the SalesOrder elements and the Detail elements. However, there is no output for the Sales element. I cannot get the SubmissionDate or NumberOfOrders from the Sales element. In order to get those values you must use the XML Task with XPath or XSLT. Have they fixed this problem in SSIS 2012? Unfortunately they have not. So you are going to have to continue with the XML Task hacks to get the attributes of the root node. Check out Matt Masson's blog (http://blogs.msdn.com/b/mattm/archive/2007/12/15/xml-source-making-things-easier-with-xslt.aspx) about using XSLT to expose the root node.
The second problem is in handling multiple namespaces, illustrated with the following XML.
<?xml version="1.0"?> <Sales SubmissionDate="2011-10-28" NumberOfOrders="2" xmlns="sales.uri" xmlns:dt="details.uri"> <SalesOrder OrderDate="2005-07-01T00:00:00" SalesOrderNumber="SO43700" TaxAmt="55.9279" SubTotal="699.0982" Freight="17.4775"> <dt:Detail OrderQty="1" UnitPrice="699.0982" LineTotal="699.098200"/> </SalesOrder> <SalesOrder OrderDate="2007-12-24T00:00:00" SalesOrderNumber="SO60686" TaxAmt="200.9856" SubTotal="2512.3200" Freight="62.8080"> <dt:Detail OrderQty="1" UnitPrice="2443.3500" LineTotal="2443.350000"/> <dt:Detail OrderQty="1" UnitPrice="4.9900" LineTotal="4.990000"/> <dt:Detail OrderQty="1" UnitPrice="8.9900" LineTotal="8.990000"/> <dt:Detail OrderQty="1" UnitPrice="54.9900" LineTotal="54.990000" /> </SalesOrder> </Sales>
The "dt" prefix indicates that the "dt:Detail" element is defined within an XML Namespace that is defined by the xmlns:dt attribute of the root node. The rest of the elements in this XML document are defined within the default Namespace of the document which is defined by the xmlns tag. Here is the problem. When you create an XML Source, you first select the file. Then you can select the XML Schema. This schema is used if If you don't have a schema, you can click on the Generate XSD button.
Normally this works and it will work with the first xml document, the one with no XML Namespace declaration or one with a single XML Namespace. However, if there are more than one XML Namespaces in the document you get the following error in all versions of SSIS (including SSIS 2012):
The problem is that Visual Studio is unable to create the kind of XSD Schema that will allow for multiple Namespaces. You will need to use XSLT to remove the Namespaces. Or you may need to create the XSD Schema files (it takes more than one) to work with multiple Namespaces.
In summary, two of the annoyances on the XML Source remain in SSIS 2012: handling root attributes and handling multiple Namespaces. I don't expect that there will be any fixes to this soon, so you will need to continue to use XML hacks to work with complex XML.