Accessing Sharepoint List Data From SSIS

Who is online?  0 guests and 0 members
Home  »  Articles  »  Accessing Sharepoint List Data From SSIS

Accessing Sharepoint List Data From SSIS

change text size: A A A
Published: 12/10/2010 by  MarkGStacey  - Views:  [3117]  

With Sharepoint rapidly becoming the data store of choice for user driven and user built data capture systems, getting this data into your warehouse is becoming more and more important.

There are three primary methods of doing this from SSIS.

1.       Install 3rd party Sharepoint List Adapters. CodePlex even has a free version at http://sqlsrvintegrationsrv.codeplex.com/releases/view/17652

2.       Write your own adapter. Again, the CodePlex project is a great start

3.       Create a script component to query Sharepoint

Of the three, I prefer them in the order listed above, but sometimes installing assemblies on your production SSIS server is simply not possible, so we are going to build this into code. The code is downloadable here :

http://www.bidn.com/Assets/Uploaded-CMS-Files/bf8b3824-2d18-481d-b10f-43852471f54fSharepointList.zip

Introduction

Sharepoint provides access to most of its’ functionality via web services, and lists are no exception. If you browse to http://{site-name}/_vti_bin/Lists.asmx you will see a list of the operations supported by this web service, as in Figure 1 below
Lists

We are interested in GetList and GetListItems - GetList gets information about the list, such as the list of fields, and GetListItems retrieves the data stored inside the list. By clicking on the operation name, you can see the SOAP call that needs to be made.

 

GetListItems

I’m copying the request below

POST /_vti_bin/Lists.asmx HTTP/1.1

Host: mgs-2010bi

Content-Type: text/xml; charset=utf-8

Content-Length: length

SOAPAction: "http://schemas.microsoft.com/sharepoint/soap/GetListItems"

 

<?xml version="1.0" encoding="utf-8"?>

<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">

  <soap:Body>

    <GetListItems xmlns="http://schemas.microsoft.com/sharepoint/soap/">

      <listName>string</listName>

      <viewName>string</viewName>

      <query>string</query>

      <viewFields>string</viewFields>

      <rowLimit>string</rowLimit>

      <queryOptions>string</queryOptions>

      <webID>string</webID>

    </GetListItems>

  </soap:Body>

</soap:Envelope>

HTTP/1.1 200 OK

Content-Type: text/xml; charset=utf-8

Content-Length: length

 

<?xml version="1.0" encoding="utf-8"?>

<soap:Envelope xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema" xmlns:soap="http://schemas.xmlsoap.org/soap/envelope/">

  <soap:Body>

    <GetListItemsResponse xmlns="http://schemas.microsoft.com/sharepoint/soap/">

      <GetListItemsResult>string</GetListItemsResult>

    </GetListItemsResponse>

  </soap:Body>

</soap:Envelope>

 

Not familiar with SOAP? Not to worry, it’s not too complex. Firstly, .NET will sort out the headers for you in the request, and then we will construct the text below it in the body, only changing the pieces that say string  to be our parameters.

And the response? Everything we care about is contained in the response string. So don’t worry at all about what the response looks like.
Now that we know where our data is coming from, let’s start building up a project to get it.
Note that I am only going to show accessing data from Sharepoint, so things like using SSIS variables for the list name etc. are not done, for brevity’s sake, but you should always do this

Practical

Create Project

Start by creating a list in Sharepoint, add a couple columns and add a couple entries. My list is called TestList, and my columns are DatePlan (an INT for simplicity in my data warehouse, you could use a date field and do a lookup downstream), MarkdownAmount and SalesAmount - both doubles.

Now create a new SSIS project in BIDS, and add a DataFlowComponent. Add a script component to the project. I normally use the Multicast as a null terminator for testing, so add one of those, link them up and add a dataviewer by right clicking on the path.
Right click on the script component, choose Advanced Editor, and go to the Input and Output Properties tab, as seen below

SCR Advanced Editor

Normally, you would rename Output 0 to something a bit more meaningful. Add the output columns - these match to the columns in your list.

A little trick for you: I normally do all of my script component coding in Visual Studio so I can step through and debug. Remember to set the target framework to .NET 2.0!

Close the advanced editor, and open the conventional editor, and edit the script.

Edit script

You are going to edit the code within the CreateNewOutPutRows, as below:

    public override void CreateNewOutputRows()

    {

        /*

          Add rows by calling the AddRow method on the member variable named "<Output Name>Buffer".

          For example, call MyOutputBuffer.AddRow() if your output was named "MyOutput".

        */

 

        //Retrieve the list of available fields from the GetList call to Sharepoint.

        //The list name should be stored in a variable

        string strFields = GetList("http://mgs-2010bi:40000/_vti_bin/Lists.asmx", "TestList", null);

        XmlDocument doc = new XmlDocument();

 

       //Parse the list of fields to remove hidden fields

        XmlNodeList xld = RunXPathQuery(strFields, "//sp:Field[not(@Hidden) and (@DisplayName)]");

     

        //Get data from Sharepoint per the fields above

        string strData = GetListItems(xld, "http://mgs-2010bi:40000/_vti_bin/Lists.asmx", "TestList", null);

        XmlDocument doc2 = new XmlDocument();

        //Parse the list of rows

        XmlNodeList xldData = RunXPathQuery(strData, "//z:row");

        //Iterate through every row and add it to the output

        //System.Windows.Forms.MessageBox.Show("Nodes: " + xldData.Count.ToString ());

        foreach (XmlNode xn in xldData)

        {

            //Owing to the Output buffer being protected, the fields need to be hardcoded

            Output0Buffer.AddRow();

            if ((xn.Attributes.GetNamedItem("ows_DatePlan") != null))

            {                   

                Output0Buffer.DatePlan = System.Convert.ToInt32(System.Convert.ToDouble (xn.Attributes.GetNamedItem("ows_DatePlan").Value));

            }

            if ((xn.Attributes.GetNamedItem("ows_MarkdownAmount") != null))

            {

                Output0Buffer.MarkdownAmount = System.Convert.ToDouble (xn.Attributes.GetNamedItem("ows_MarkdownAmount").Value);

            }

            if ((xn.Attributes.GetNamedItem("ows_SalesAmount") != null))

            {

                Output0Buffer.SalesAmount = System.Convert.ToDouble (xn.Attributes.GetNamedItem("ows_SalesAmount").Value);

            }

        }

 

    }

 

Below this code is a section called HelperFunctions, which you will be able to copy verbatim, and reuse. We start by calling a function, GetList, within these HelperFunctions which calls the Sharepoint list Operation GetList.
string strFields = GetList("http://mgs-2010bi:40000/_vti_bin/Lists.asmx", "TestList", null);

The list of fields is returned as an XML string, so we parse this to remove unnecessary fields - Sharepoint has a lot of system fields that we really don’t need, so we only show non-hidden fields that have display set, using an XPath query.

        XmlDocument doc = new XmlDocument();

 

       //Parse the list of fields to remove hidden fields

        XmlNodeList xld = RunXPathQuery(strFields, "//sp:Field[not(@Hidden) and (@DisplayName)]");

This returns us an XML node list of fields, and we pass this, the location of the Lists web service and the name of the list to GetListItems, which will also return an XML string of the data in the list.

We parse this to only include the actual row data

//Get data from Sharepoint per the fields above

        string strData = GetListItems(xld, "http://mgs-2010bi:40000/_vti_bin/Lists.asmx", "TestList", null);

        XmlDocument doc2 = new XmlDocument();

        //Parse the list of rows

        XmlNodeList xldData = RunXPathQuery(strData, "//z:row");

Finally, we iterate through every row, and add the values to the output colums

·         Foreach loop to iterate

 

        //System.Windows.Forms.MessageBox.Show("Nodes: " + xldData.Count.ToString ());

        foreach (XmlNode xn in xldData)

·          Add a new row
            Output0Buffer.AddRow();

·         Add value from a column to the output buffer

if ((xn.Attributes.GetNamedItem("ows_DatePlan") != null))

            {                   

                Output0Buffer.DatePlan = System.Convert.ToInt32(System.Convert.ToDouble (xn.Attributes.GetNamedItem("ows_DatePlan").Value));

            }

Note that the GetNamedItem parameter has an “ows_” prefixed to the name of the Sharepoint column - Sharepoint does this internally, so you need to add it whenever accessing Sharepoint info. Also note that I am checking for the existence of the column before adding it - this means that your ETL remains robust, and if a user deletes a column, it won’t break the entire ETL.
And finally, an explicit conversion is required, as
xn.Attributes.GetNamedItem("ows_XXXX").Value will always return a string.

These are the basic mechanics of using the HelperFunctions (included below)
In my next blog post, I will work through what exactly these helper functions are doing.

#region Helper Functions

 

        private XmlNodeList RunXPathQuery(XmlNode xNode, string XPathQuery)

        {

 

            XmlDocument Document = new XmlDocument();

            Document.LoadXml(xNode.OuterXml);

 

            return RunXPathQuery(Document, XPathQuery);

        }

 

        private XmlNodeList RunXPathQuery(string strXML, string XPathQuery)

        {

            // load the complete XML node and all its child nodes into a XML document

            XmlDocument Document = new XmlDocument();

            //Document.LoadXml(XmlNodeToQuery.OuterXml);

            Document.LoadXml(strXML);

            return RunXPathQuery(Document, XPathQuery);

        }

 

 

        private XmlNodeList RunXPathQuery(XmlDocument Document, string XPathQuery)

        {

 

 

            // all the possible namespaces used by SharePoint and a randomly choosen prefix

            const string SharePointNamespacePrefix = "sp";

            const string SharePointNamespaceURI = "http://schemas.microsoft.com/sharepoint/soap/";

            const string ListItemsNamespacePrefix = "z";

            const string ListItemsNamespaceURI = "#RowsetSchema";

            const string PictureLibrariesNamespacePrefix = "y";

            const string PictureLibrariesNamespaceURI = "http://schemas.microsoft.com/sharepoint/soap/ois/";

            const string WebPartsNamespacePrefix = "w";

            const string WebPartsNamespaceURI = "http://schemas.microsoft.com/WebPart/v2";

            const string DirectoryNamespacePrefix = "d";

            const string DirectoryNamespaceURI = "http://schemas.microsoft.com/sharepoint/soap/directory/";

 

            // now associate with the xmlns namespaces (part of all XML nodes returned

            // from SharePoint) a namespace prefix which we can then use in the queries

            XmlNamespaceManager NamespaceMngr = new

            XmlNamespaceManager(Document.NameTable);

            NamespaceMngr.AddNamespace(SharePointNamespacePrefix, SharePointNamespaceURI);

            NamespaceMngr.AddNamespace(ListItemsNamespacePrefix, ListItemsNamespaceURI);

            NamespaceMngr.AddNamespace(PictureLibrariesNamespacePrefix, PictureLibrariesNamespaceURI);

            NamespaceMngr.AddNamespace(WebPartsNamespacePrefix, WebPartsNamespaceURI);

            NamespaceMngr.AddNamespace(DirectoryNamespacePrefix, DirectoryNamespaceURI);

 

            // run the XPath query and return the result nodes

            return Document.SelectNodes(XPathQuery, NamespaceMngr);

        }

 

 

 

        string GetAllLists(string strURI, string proxy)

        {

            string strXML = @"<?xml version=""1.0"" encoding=""utf-8""?>

<soap:Envelope xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" xmlns:soap=""http://schemas.xmlsoap.org/soap/envelope/"">

  <soap:Body>

    <GetListCollection xmlns=""http://schemas.microsoft.com/sharepoint/soap/"" />

  </soap:Body>

</soap:Envelope>";

            return HttpSOAPRequest(strXML, strURI, proxy);

        }

 

 

        string GetListItems(XmlNodeList xld , string strURI, string strListName, string proxy)

        {

            string strXML = @"<?xml version=""1.0"" encoding=""utf-8""?>

<soap:Envelope xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" xmlns:soap=""http://schemas.xmlsoap.org/soap/envelope/"">

  <soap:Body>

    <GetListItems xmlns=""http://schemas.microsoft.com/sharepoint/soap/"">

      <listName>" + strListName + @"</listName>

       <viewFields>

       <ViewFields>

         ";

            foreach (XmlNode xn in xld)

            {

                strXML += @"<FieldRef Name='" + xn.Attributes.GetNamedItem("Name").Value   + @"' /> ";

            }

       strXML += @"</ViewFields>

     </viewFields>

    </GetListItems>

  </soap:Body>

</soap:Envelope>";

 

            return HttpSOAPRequest(strXML, strURI, proxy);

        }

 

        string GetList(string strURI, string strListName, string proxy)

        {

 

            string strXML = @"<?xml version=""1.0"" encoding=""utf-8""?>

<soap12:Envelope xmlns:xsi=""http://www.w3.org/2001/XMLSchema-instance"" xmlns:xsd=""http://www.w3.org/2001/XMLSchema"" xmlns:soap12=""http://www.w3.org/2003/05/soap-envelope"">

  <soap12:Body>

    <GetList xmlns=""http://schemas.microsoft.com/sharepoint/soap/"">

      <listName>" + strListName + @"</listName>

    </GetList>

  </soap12:Body>

</soap12:Envelope>";

            return HttpSOAPRequest(strXML, strURI,  proxy);

        }

 

 

        string HttpSOAPRequest(string strXML ,string strURI,  string proxy)

        {

 

            XmlDocument doc = new XmlDocument();

            doc.LoadXml(strXML);

            HttpWebRequest req = (HttpWebRequest)WebRequest.Create(strURI);

            //req.Proxy = null;

            if (proxy != null) req.Proxy = new WebProxy(proxy, 8888);

            req.ContentType = "text/xml;charset=\"utf-8\"";

            //req.Accept = "text/xml";

            req.Method = "POST";

            //Use the current authentication. This will be the SQL Agent account when in production, so make sure that account has access to the list

            req.UseDefaultCredentials = true;

            Stream stm = req.GetRequestStream();

            doc.Save(stm);

            stm.Close();

           

            try

            {

                //WebResponse resp = req.GetResponse();

                //It has been alleged to me that the section below is more correct

                HttpWebResponse resp = req.GetResponse() as HttpWebResponse;

               

                stm = resp.GetResponseStream();

                StreamReader strmReader = new StreamReader(stm);

                string sRet = strmReader.ReadToEnd();

                //This line is important - you will get a hang if you don't close the stream in SSIS. It'll work in .NET 4 though,

                //Good reason to remember to target the correct framework

                //Thanks to Robert MacClean for his assistance. http://twitter.com/RMacClean

                strmReader.Close();

                return sRet;

            }

            catch (System.Exception ex)

            {

                string str = ex.Message;

                return "";

            }

 

        }

 

   

#endregion

 

 

 

 

 

 

 

djha and daniskate like this.
 
0
/5
Avg: 0/5: (0 votes)

Comments (4)

user543324
user543324 said:
Interesting stuff. Filing this away for future use. Pulling from sharepoint has been coming up more and more of late.
5/25/2012
 · 
 
by
Toufeeq
Toufeeq said:
Just what I need thanks Mark
6/20/2012
 · 
 
by
rajamallela
rajamallela said:
I am using this code,but this code is fetching only fist 100 rows, whereas my list has around 200 rows...what could be the reason...? Thanks in advance
8/17/2012
 · 
 
by
Toufeeq
Toufeeq said:
How do you update a list programmatically? also how do you retrieve the list of USers if your field is a AD group field because I am only getting back the first user in the list
11/28/2012
 · 
 
by
  • Name:*
  • Email:*
  • Website:
Type the characters you see in the image: *

Most Recent Articles