In my last post, I introduced at a high-level the concepts important to both master data and master data management. We also discussed some of the driving factors behind the renewed interest in this arena and how people, process and technology are involved. In this post we are going to leave all that behind to focus solely on the technology aspect of implementing a Master Data Management (MDM) solution.
Part 1 - Understanding Master Data
It's worth spending some time in this post to review the three architectures that are common within MDM so that once we dive into MDS we will have a better understanding of how it works. The architectures we will look at are the transaction hub, registry and the hybrid. For the purpose of illustrating each architecture we will use a scenario in which we are managing customers enterprise-wide.
The most straight-forward of the three architectures we will look at, the Transaction Hub is also the most difficult to implement. In this architecture, a single repository is used and all systems or applications that require access to master data connect to it. The applications then no longer maintain a data store of the master data. If you are familiar with service-oriented architecture (SOA), this architecture should seem familiar.
While the advantage of having a single source of master data should be obvious, the price paid means that this architecture is more idealistic than practical. To implement this architecture each system's data tier or data access layer must be modified to use the repository. Therein lies both the challenge (which in some shops can be impossible ) and its greatest disadvantage of the Transaction Hub.
A customer service is developed and implemented as a web service. The customer service is responsible for all typical CRUD activities and serves as an interface to any application requiring access to the customer entity. After the service is available, the CRM, ERP and E-Commerce systems are all modified so that they no longer store customers in their system databases. Instead, the applications will interact directly with the new customer service to create, search or maintain customers.
At the opposite end of the spectrum, architecturally speaking, is the the Registry. In this architecture, each system maintains its own store of data. A database and management application is then created to map each system's entities to one another. Because there is not a single set of master data a plan to handle duplicates is implemented which means defining survivorship rules.
An important aspect and the greatest advantage of this architecture is that no entity data other than a key is stored by the Registry. All this makes implementation of this architecture straight-forward since at most only minor systems changes are required to extract data if those capabilities don't already exists. That being the upside to this architecture, the downside in my view is significant.
First, and a deal-breaker for most, is that this architecture is considered, read-only. What this means is that there is no function that pushes updates back down to the source systems since it merely maps data. The second downside is a result of the no data being stored. In this architecture if you need to get a complete set of master data, distributive queries are your answer. Aside from the performance impact and utilization of each systems resources, maintenance on these queries can be a nightmare as each source system evolves.
Scenario: Either real-time or daily batch customer extracts are taken from each source system. The customer records are matched using automated routines and pre-defined business rules. If duplicates are identified, survivorship rules are used to identify the primary record. A data steward may use a custom application to manage the customer mapping contained within the registry.
When it is time to extract data for a marketing mail campaign, distributive queries are issued to each source system to retrieve the complete set of customers based on the mappings contained within the registry.
As the name implies, the hybrid approach borrows ideas from both the Transaction Hub and Registry architectures. The Hybrid architecture consist of a central database that stores both a cross-system entity mapping as well as the entity attributes that are shared amongst the disparate applications. Each application retains its own database containing its copy of the master data. The Hybrid solution is responsible for detecting changes, storing the changes in its own database and then propagating or synchronizing the changes back out to subscribing downstream systems. This synchronization can occur in either a push or pull methodology. In this way the Hybrid approach is able to capitalize on each architectures strong points while negating the disadvantages.
There are several benefits to this architecture which are notable. The first is that it is possible to define a data quality policy and business rule checks when source data changes are detected. Circling back to my last blog post, this is one way in which we push and ensure data quality at the edges of the IT ecosystem, prior to propagating potentially bad data throughout the enterprise.
Secondly, and similar to the Transaction Hub, is that the Hybrid architecture provides a single, clean and consistent copy of master data that is available to all applications. Again, going back to my first posts, this is one of the keys to an effective master data management solution.
The final benefit is in the realm of implementation. Since each source system maintains it's own database, you can pick and choose how each system interacts with this MDM architecture. Some systems may take the role of publishers and that action can occur either in real-time or in batch as the situation may require. Other, can be subscribers, reading in new records and updates into their own data stores. Frequently, systems will be both publishers and subscribers, again with the flexible for the frequency of the interactions to take place as the business requires.
If I haven't sold you on the Hybrid architecture by now, this last point should. While Master Data Services is capable of supporting any of the three architectures I've described today, the Hybrid architecture is the most commonly implemented architecture when Master Data Services is used. That being the case and this being a blog about Master Data Services, the Hybrid approach is one that we will reference frequently through-out this series of blogs.
Through the course of this blog, we've built on our understanding of master data and master data management to see how organizations implement solutions to handle MDM. We focus on the three common architectures which are the Transaction Hub, the Registry and the Hybrid approach. As I pointed out, MDS can support all three architectures but where is really shines is with the Hybrid architecture. In the next entry we will stick with the architecture theme, only we will be focusing on the architecture of Master Data Services 2012. We will walk through the installation and set-up process in detail, exploring the important aspects of the MDS architecture as we go.
Till next time!