posted 12/10/2010 by MarkGStacey - Views: [4903]
With Sharepoint rapidly becoming the data store of choice for user driven and user built data capture systems, getting this data into your warehouse is becoming more and more important.
There are three primary methods of doing this from SSIS.
Of the three, I prefer them in the order listed above, but sometimes installing assemblies on your production SSIS server is simply not possible, so we are going to build this into code. The code is downloadable here :
http://www.bidn.com/Assets/Uploaded-CMS-Files/bf8b3824-2d18-481d-b10f-43852471f54fSharepointList.zip
Sharepoint provides access to most of its’ functionality via web services, and lists are no exception. If you browse to http://{site-name}/_vti_bin/Lists.asmx you will see a list of the operations supported by this web service, as in Figure 1 below
We are interested in GetList and GetListItems - GetList gets information about the list, such as the list of fields, and GetListItems retrieves the data stored inside the list. By clicking on the operation name, you can see the SOAP call that needs to be made.
I’m copying the request below
POST /_vti_bin/Lists.asmx HTTP/1.1
Host: mgs-2010bi
Content-Type: text/xml; charset=utf-8
Content-Length: length
SOAPAction: "http://schemas.microsoft.com/sharepoint/soap/GetListItems"
<?xml version="1.0" encoding="utf-8"?>
<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">
<soap:Body>
<GetListItems xmlns="http://schemas.microsoft.com/sharepoint/soap/">
<listName>string</listName>
<viewName>string</viewName>
<query>string</query>
<viewFields>string</viewFields>
<rowLimit>string</rowLimit>
<queryOptions>string</queryOptions>
<webID>string</webID>
</GetListItems>
</soap:Body>
</soap:Envelope>
HTTP/1.1 200 OK
<GetListItemsResponse xmlns="http://schemas.microsoft.com/sharepoint/soap/">
<GetListItemsResult>string</GetListItemsResult>
</GetListItemsResponse>
Not familiar with SOAP? Not to worry, it’s not too complex. Firstly, .NET will sort out the headers for you in the request, and then we will construct the text below it in the body, only changing the pieces that say string to be our parameters.
And the response? Everything we care about is contained in the response string. So don’t worry at all about what the response looks like.Now that we know where our data is coming from, let’s start building up a project to get it.Note that I am only going to show accessing data from Sharepoint, so things like using SSIS variables for the list name etc. are not done, for brevity’s sake, but you should always do this
Start by creating a list in Sharepoint, add a couple columns and add a couple entries. My list is called TestList, and my columns are DatePlan (an INT for simplicity in my data warehouse, you could use a date field and do a lookup downstream), MarkdownAmount and SalesAmount - both doubles.
Now create a new SSIS project in BIDS, and add a DataFlowComponent. Add a script component to the project. I normally use the Multicast as a null terminator for testing, so add one of those, link them up and add a dataviewer by right clicking on the path.Right click on the script component, choose Advanced Editor, and go to the Input and Output Properties tab, as seen below
Normally, you would rename Output 0 to something a bit more meaningful. Add the output columns - these match to the columns in your list.
A little trick for you: I normally do all of my script component coding in Visual Studio so I can step through and debug. Remember to set the target framework to .NET 2.0!
Close the advanced editor, and open the conventional editor, and edit the script.
You are going to edit the code within the CreateNewOutPutRows, as below:
public override void CreateNewOutputRows()
{
/*
Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".
For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".
*/
//Retrieve the list of available fields from the GetList call to Sharepoint.
//The list name should be stored in a variable
string strFields = GetList("http://mgs-2010bi:40000/_vti_bin/Lists.asmx", "TestList", null);
XmlDocument doc = new XmlDocument();
//Parse the list of fields to remove hidden fields
XmlNodeList xld = RunXPathQuery(strFields, "//sp:Field[not(@Hidden) and (@DisplayName)]");
//Get data from Sharepoint per the fields above
string strData = GetListItems(xld, "http://mgs-2010bi:40000/_vti_bin/Lists.asmx", "TestList", null);
XmlDocument doc2 = new XmlDocument();
//Parse the list of rows
XmlNodeList xldData = RunXPathQuery(strData, "//z:row");
//Iterate through every row and add it to the output
//System.Windows.Forms.MessageBox.Show("Nodes: " + xldData.Count.ToString ());
foreach (XmlNode xn in xldData)
//Owing to the Output buffer being protected, the fields need to be hardcoded
Output0Buffer.AddRow();
if ((xn.Attributes.GetNamedItem("ows_DatePlan") != null))
Output0Buffer.DatePlan = System.Convert.ToInt32(System.Convert.ToDouble (xn.Attributes.GetNamedItem("ows_DatePlan").Value));
}
if ((xn.Attributes.GetNamedItem("ows_MarkdownAmount") != null))
Output0Buffer.MarkdownAmount = System.Convert.ToDouble (xn.Attributes.GetNamedItem("ows_MarkdownAmount").Value);
if ((xn.Attributes.GetNamedItem("ows_SalesAmount") != null))
Output0Buffer.SalesAmount = System.Convert.ToDouble (xn.Attributes.GetNamedItem("ows_SalesAmount").Value);
Below this code is a section called HelperFunctions, which you will be able to copy verbatim, and reuse. We start by calling a function, GetList, within these HelperFunctions which calls the Sharepoint list Operation GetList.string strFields = GetList("http://mgs-2010bi:40000/_vti_bin/Lists.asmx", "TestList", null);
The list of fields is returned as an XML string, so we parse this to remove unnecessary fields - Sharepoint has a lot of system fields that we really don’t need, so we only show non-hidden fields that have display set, using an XPath query.
This returns us an XML node list of fields, and we pass this, the location of the Lists web service and the name of the list to GetListItems, which will also return an XML string of the data in the list.
We parse this to only include the actual row data
Finally, we iterate through every row, and add the values to the output colums
Note that the GetNamedItem parameter has an “ows_” prefixed to the name of the Sharepoint column - Sharepoint does this internally, so you need to add it whenever accessing Sharepoint info. Also note that I am checking for the existence of the column before adding it - this means that your ETL remains robust, and if a user deletes a column, it won’t break the entire ETL.And finally, an explicit conversion is required, as xn.Attributes.GetNamedItem("ows_XXXX").Value will always return a string.
These are the basic mechanics of using the HelperFunctions (included below) In my next blog post, I will work through what exactly these helper functions are doing.
#region Helper Functions
private XmlNodeList RunXPathQuery(XmlNode xNode, string XPathQuery)
XmlDocument Document = new XmlDocument();
Document.LoadXml(xNode.OuterXml);
return RunXPathQuery(Document, XPathQuery);
private XmlNodeList RunXPathQuery(string strXML, string XPathQuery)
// load the complete XML node and all its child nodes into a XML document
//Document.LoadXml(XmlNodeToQuery.OuterXml);
Document.LoadXml(strXML);
private XmlNodeList RunXPathQuery(XmlDocument Document, string XPathQuery)
// all the possible namespaces used by SharePoint and a randomly choosen prefix
const string SharePointNamespacePrefix = "sp";
const string SharePointNamespaceURI = "http://schemas.microsoft.com/sharepoint/soap/";
const string ListItemsNamespacePrefix = "z";
const string ListItemsNamespaceURI = "#RowsetSchema";
const string PictureLibrariesNamespacePrefix = "y";
const string PictureLibrariesNamespaceURI = "http://schemas.microsoft.com/sharepoint/soap/ois/";
const string WebPartsNamespacePrefix = "w";
const string WebPartsNamespaceURI = "http://schemas.microsoft.com/WebPart/v2";
const string DirectoryNamespacePrefix = "d";
const string DirectoryNamespaceURI = "http://schemas.microsoft.com/sharepoint/soap/directory/";
// now associate with the xmlns namespaces (part of all XML nodes returned
// from SharePoint) a namespace prefix which we can then use in the queries
XmlNamespaceManager NamespaceMngr = new
XmlNamespaceManager(Document.NameTable);
NamespaceMngr.AddNamespace(SharePointNamespacePrefix, SharePointNamespaceURI);
NamespaceMngr.AddNamespace(ListItemsNamespacePrefix, ListItemsNamespaceURI);
NamespaceMngr.AddNamespace(PictureLibrariesNamespacePrefix, PictureLibrariesNamespaceURI);
NamespaceMngr.AddNamespace(WebPartsNamespacePrefix, WebPartsNamespaceURI);
NamespaceMngr.AddNamespace(DirectoryNamespacePrefix, DirectoryNamespaceURI);
// run the XPath query and return the result nodes
return Document.SelectNodes(XPathQuery, NamespaceMngr);
string GetAllLists(string strURI, string proxy)
string strXML = @"<?xml version=""1.0"" encoding=""utf-8""?>
<soap:Envelope xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" xmlns:soap=""http://schemas.xmlsoap.org/soap/envelope/"">
<GetListCollection xmlns=""http://schemas.microsoft.com/sharepoint/soap/"" />
</soap:Envelope>";
return HttpSOAPRequest(strXML, strURI, proxy);
string GetListItems(XmlNodeList xld , string strURI, string strListName, string proxy)
<GetListItems xmlns=""http://schemas.microsoft.com/sharepoint/soap/"">
<listName>" + strListName + @"</listName>
<viewFields>
<ViewFields>
";
foreach (XmlNode xn in xld)
strXML += @"<FieldRef Name='" + xn.Attributes.GetNamedItem("Name").Value + @"' /> ";
strXML += @"</ViewFields>
</viewFields>
string GetList(string strURI, string strListName, string proxy)
<soap12:Envelope xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" xmlns:soap12=""http://www.w3.org/2003/05/soap-envelope"">
<soap12:Body>
<GetList xmlns=""http://schemas.microsoft.com/sharepoint/soap/"">
</GetList>
</soap12:Body>
</soap12:Envelope>";
string HttpSOAPRequest(string strXML ,string strURI, string proxy)
doc.LoadXml(strXML);
HttpWebRequest req = (HttpWebRequest)WebRequest.Create(strURI);
//req.Proxy = null;
if (proxy != null) req.Proxy = new WebProxy(proxy, 8888);
req.ContentType = "text/xml;charset=\"utf-8\"";
//req.Accept = "text/xml";
req.Method = "POST";
//Use the current authentication. This will be the SQL Agent account when in production, so make sure that account has access to the list
req.UseDefaultCredentials = true;
Stream stm = req.GetRequestStream();
doc.Save(stm);
stm.Close();
try
//WebResponse resp = req.GetResponse();
//It has been alleged to me that the section below is more correct
HttpWebResponse resp = req.GetResponse() as HttpWebResponse;
stm = resp.GetResponseStream();
StreamReader strmReader = new StreamReader(stm);
string sRet = strmReader.ReadToEnd();
//This line is important - you will get a hang if you don't close the stream in SSIS. It'll work in .NET 4 though,
//Good reason to remember to target the correct framework
//Thanks to Robert MacClean for his assistance. http://twitter.com/RMacClean
strmReader.Close();
return sRet;
catch (System.Exception ex)
string str = ex.Message;
return "";
#endregion
Is this article functioning incorrectly (no text) or is this a place holder for an article that will be released in the future?