I am often asked the question about WHICH Microsoft technology to use when integrating systems, and it’s a very relevant question, as there are so many different technologies. The good news is that each does have its’ place – there is some overlap, and most of these can be shoehorned to do the other roles, but for the most part, it’s fairly clearcut.
First, a list of the technologies
Custom SOA Integration. Not an integration technology, you’re writing code….
“Online” integration, especially mobile
Message Queuing – high performance integration
Complex Event Processing
Of these, I have implemented a production version of all SSIS, SyncFramework, Service Broker, and dabbled in WCF during my dev. I’m currently prototyping a StreamInsight application, and I have never seen BizTalk.
So let’s start with what I’m not really familiar with, not co-incidentally the technologies not provided as part of the SQL stack.
BizTalk is a real-time, service integration. Do your apps talk to each other in a realtime manner? Are the data loads small? Do they need to go via HTTP? You may need BizTalk, but I have no idea how it helps you over coding a web service. But I’m sure it’s great. WCF of course will also give you options to use MSMQ as your comms channel, and the WCF over MSMQ vs Service Broker is one of the discussions that has real merit.
The choice, as I see is it, needs to be based on whether the channels will be consumed at a data layer or an application layer. If processing and actions are going to occur from the messages directly, then WCF/MSMQ is the correct channel. However, if the messages are going to be used in conjunction with other data, then Service Broker may be a better choice. But more on service broker soon.
SQL Server Integration Services, the successor to DTS, is the most used and most widely known of the Microsoft integration stack.
I have most commonly encountered it in loading data warehouses, in an ETL (Extract/Transform/Load) capacity, and this is its’ ideal scenario. Large data loads, relatively homogenous data (i.e. the structure does not change frequently), typically loading from flat files or databases, and definitively loading INTO databases, and more rarely, flat files.
While SSIS does support web services to some extent, it is not designed for this. It does include a “Script task” that lets you do anything you can do in C# or VB.NET (including loading custom assemblies), and this is typically how I query web services from SSIS.
SSIS provides a large array of data sources, and even more transforms, including various lookups, fuzzy matching, deduplication, and other tools, and is even more extended by third party tools. Notification of load status using email, SMS, web service call and other methods are all supported.
What it is NOT suited for is any type of real time integration, or integration with data sources whose structure is dynamic. Also, in most use cases, it is a one-way integration, although of course, one could implement both directions.
Skill Pre-Requisites: T-SQL, SSIS
Overlaps: Not really
Example Application: Data Warehouse, Scheduled Integration between OLTP systems
Microsoft Sync Framework is designed for distributed and occasionally connected application architecture.
A great example is a field engineer with a mobile device, using an app which will make updates on the local DB, and have them sync’d automatically to the main DB when connectivity is available.
The Sync Framework allows for 2 way integration of a database, by default SQL Server and SQL Server Compact (if you are using an Express or higher edition on the mobile device, a little more work is required), and handles resolution conflicts when updates are made on both the mobile as well as the central DB. And of course, multiple mobile devices are totally supported, and the update code itself can run on either the mobile or the central server (there are performance trade-offs concerning what goes over the wire)
Advantages: If you have a distributed architecture, Sync Framework will definitely save you development time. In addition, it does not require a SQL licence, it can Sync between Express and CE or Express and Express.
Disadvantages: Sync Framework is suited for neither high volume nor low latency applications.
Skill Pre-Requisites: T-SQL, .NET 2.0
Overlaps: Service Broker or WCF for high performance or Low Latency. Easier to implement though
Example App: Occasionally Connected Mobile Application
Service Broker is a high performance message queuing technology built into the SQL stack. The essence of a message queue is that a message is pushed onto a queue (on a server) and popped off the queue (possibly on a different server, possibly different messages by different servers)
As it is optimised for this uni-directional data flow, it is *MUCH* faster than writing to a table, even a heap. In addition, you retain the transactional consistency of SQL Server, so once data has been written to a queue, it is guaranteed to be in the queue, and will only be deleted from the source server has it has been received.
As an in-depth example of how this can be used to enable high-performance applications, one of the systems we developed had a requirement that every single change to the DB be audited in a manner that allowed for rollback, and also traces of the user, the machine used and some other information. This in the end turned out to mean that the audit DB was growing at about 4 times the rate of the operational database.
So the solution was to push the audit messages onto the Service Broker queue, and have them written by separate servers (setting up a receiver on each)
In addition, as the receivers for one queue can sit on multiple machines, another solution is to only run the receivers on the machine writing to the queue when that machine is not under load. The receivers can be throttled.
Another scenario is real-time loading of information to a data warehouse or OLTP system.
Caveat: Data sent in a service broker message is NOT compressed. It is a good idea to compress your messages if you are working on a geographically distributed solution, and are not CPU bound.
Another usage of Service broker is to perform some calcs/aggregations on the data as it comes off the queue, and prior to writing to DB. Has benefits, but loses out to StreamInsight on processing speed. However, if you already are using ServiceBroker, simplicity of only using one technology would win out. And Stream Insight does NOT provide a transport layer.
Advantages: Fast communication. Scalability (scale out)
Disadvantages : No native compression, only WCF provides that out of the other technologies though.
Skill Pre-Requisites: T-SQL, Service Broker.
Overlaps: MSMQ. Stream Insight, but only a little
Example App: High Performance Distributed Auditing. Real Time Data Loads. Scale Out Architecture
Firstly, let me make this very clear: Stream Insight does not provide a transport layer. Stream Insight is for Complex Event Processing, but the events themselves must be provided to the Stream Insight process.
Having said that, hooking it up to transport layers is easy enough, as owing to its’ 3 piece architecture (Input Adapter, Query Layer, and Output Adapter), wiring the transport into the Input Adapter will be as easy as building the Input Adapter.
And on to the next point…. Stream Insight is not an application. It is a technology. You will be writing C# code (and no other .NET language mind you, ONLY C#) for your Input Adapters, Output Adapters, and binding LINQ Queries to both.
Stream Insight provides a mechanism to do calculations and aggregations on events as they come in, and all the data is stored in memory. An output adapter can provide the results of these calculations/aggregations to a UI, a web service, or persist it to a database – actually anything you want it to do really, since you have to write the code.
A good example of the usage of this is monitoring a rev counter on an engine. For every revolution, a single event is sent (a “Point” event in CEP terms), and storing each of these in a database will quickly lead to huge volumes. A more efficient approach may be to aggregate the revolution events, and at a set interval (a minute, or 10 seconds, or every second even), write the Maximum, Minimum, Count, Average and Mean values to a table. A vastly smaller footprint, with not much information value lost. Of course, choosing the aggregations, intervals and calculations correctly is essential – information such as the rate of acceleration is important when monitoring driving style and would be lost with the above set of calculations.
My next pass with this is comparing the values above against a predictive model, and sending alerts for outliers. For instance, if the engine was on a car, a vastly different driving style on this particular vehicle may indicate that it has been stolen.
Combining separate streams is the next area : so multiple sensors can be combined in one of these queries.
Advantages: The performance benefit of using the CEP streams is immense over trying to use a DBMS. In addition, the separation of Input, Output and Query allows for the creation of multiple applications where User defined queries based on the needs of the moment (for example dashboards, alerts etc) can be issued without changing the underlying code. This creates the space for an ecosystem of partners to develop based on StreamInsight, as the DBMS of CEP, vs other CEP providers who supply a complete solution.
Also, a built in persistence model to a CE database is great.
Dis-advantages: It’s like installing SQL. You do not have an application; you have a technology to create an application. Also, advanced .NET coding skills are required, .NET 2.0 knowledge will not suffice
Web and Standard Edition only support up to 5000 events per second. Data Center edition is expensive
Skill Pre-Requisites: T-SQL, .NET 3.5, Stream Insight
Overlaps: Service Broker, but only a little. Definite use case for both in one application
Example App: Realtime Sensor monitoring, High Speed Investment Trading, Monitoring of Web Service Feeds
High volume, batch loading
2 way synchronisation, low volume,
Low latency, high or low volume
Calculations – used in conjunction with a transport layer
That's a fantastic overview. I've also seen a really good presentation from MS TechEd on that from the SSIS Program/Product/Whatever :) Manager, Jeff Bernhardt: http://www.msteched.com/2010/NorthAmerica/BIE202
Thanks Todd, that is an awesome video. He's taken the time to dive into two areas I considered out of scope (Replication, and MDM - Master Data Services in MS speak), and also a deep dive into WHY you think about the different areas