Developing a MDDE has been a very interesting challenge
It basically started when I had to redevelop a system that has taken years to develop with the Cognos data management tools needed to be replaced by a new MS SSIS system. The initial estimate by the people that were part of the initial system was 6 months - we had a month!!
This was not going to work if we were to do everything manually using SSIS, although sometimes I thought it would have been a lot easier!
My initial investigations led to a codeplex project that started this concept in my mind of MDDE. Basically this system loaded SSIS packages and allowed you to generate packages from the loaded templates. After overcoming a few .Net code bugs to cater for SQL 2008 SSIS packages, I realised that this was not enough.
We were able to break up the development into 2 systems - internal and external ETL. The internal ETL system handled the movement of FACT (measure group) data from the base MasterData (MD) system to the different internal systems - Operational (O), Tactical (T), Tactical MonthEnd (TME) and Strategic (S). In a previous phase of this system we had consolidated all our dimensional data in a MasterData system, so no dimensional data needed to be copied to the different internal systems.
The major different between the different internal systems was the datetime periods of the business facts. Tactical contains Month-To-Date information, Tactical MonthEnd contains last month's information - this contains sufficient information to generate the monthly report packs. Strategic contains all the information ever added to the BI platform.
SQL 2008 introduced the new MERGE SQL statement which has been used extensively in this system. I started developing a system to manage the SQL tables that contained the measure group information. These tables were further categorised by AggregationType (Sales, Cash, Debtors, Track and Aggregates). Using the SQL Table design in MasterData I could write SQL stored procedures to generate SQL MERGE statements that would handle the ETL process between MD and the other internal systems. The tables in the different systems were exactly the same.
The basic MERGE template went as follows:
MERGE INTO [Tgt Srv].[Tgt_DB].dbo.[Tgt_Table] AS [Target]
USING [Src Srv].[Src_DB].dbo.[Src_Table] AS [Source]
ON [Source].PK = [Target].PK
WHEN Matched
THEN UPDATE
SET TableData = [Source].TableData
WHEN Not Matched
THEN INSERT
(
TableData
}
SELECT [Source].TableData
OUTPUT $action, inserted.*, deleted.*
Using the above template I could automate the generation of SQL statements for every target system table combination. The resultant SSIS packages turned out to be really simple:
Set the Aggregation Type that we are going to perform the ETL for
Start the ETL for a system and aggregation type defined in step 1
SELECT from the MDDE database the list of all tables for the AggregationType that contained "generated" SQL
For each table returned from step a:
Execute SQL
Send Email if Execute failed
If execute was successful – audit the MERGE (i.e. check that all the information was copied to target system)
After all tables have been ETL'd we can now do the Aggregation for the current Aggregation Type
Go the next AggregationType
Whilst this development internal system took some effort and testing – it was relatively simple compared to the task of automating the MDDE for external systems!
As one can appreciate the External systems do not represent the information sitting in MasterData and they are not all MS SQL database systems. In this case we are dealing with Excel files and multiple progress databases.
So now the question was: how do I start collecting MetaData for the external systems? I can't just do a select out of "sys.columns"!!! In the past few years I have on several occasions used SQL linked servers and to be honest have never had much joy – except if the linked server was pointing to another SQL server. I still find it amazing that MSSQL really battles to define these links and Excel and SSIS have no problems. So after spending many hours trying an alternative route: OPENROWSET I returned to the "Linked Server" approach but not entirely, I made use of the OPENQUERY (LinkedServer, 'SQL SELECT') statement.
Manage to get reliable results from this approach but still had a few challenges – mainly how do I get the metadata from the external systems. This problem becomes very challenging when the normal SQL trick "SELECT A.* INTO TgtTable FROM SrcTable A WHERE 1 = 0" does not work. Fortunately I could achieve the same by doing a "SELECT TOP 1 A.* INTO TgtTable FROM SrcTable".
Now it was getting really exciting – I now have the external systems tables sitting in SQL. This means that I can automate the building of the ETL SQL statements that pull the data from the Source System using the following template:
DECLARE @ETLProcedure varchar(0124) = ''
, @ETLRows int
, @Err int
, @ErrState int
, @ErrSeverity int
, @ErrLine int
, @ErrMesg varchar(max)
, @ETLTableID int
, @ETLBatchID int
TRUNCATE TABLE PUB.pdmAccRef
BEGIN TRY
INSERT INTO PUB.pdmAccRef
ColA
, ColB
)
SELECT DISTINCT
A.ColA
, A.ColB
FROM OPENQUERY (
Linked_Server
, 'SELECT * FROM PUB.pdmAccRef'
) AS A
SELECT @ETLRows = @@ROWCOUNT
END TRY
BEGIN CATCH
SELECT
@Err = ERROR_NUMBER()
,@ErrSeverity = ERROR_SEVERITY()
,@ErrState = ERROR_STATE()
,@ETLProcedure = OBJECT_NAME(@@PROCID)
,@ErrLine = ERROR_LINE()
,@ErrMesg = ERROR_MESSAGE()
END CATCH
So now I can use another simple ETL SSIS package that does the following:
Get the list of external systems i.e. the list of defined LinkedServer
For Each LinkedServer
Get a list of external tables
For each external table
Execute the SQL defined above
Send Email if there was an error
Audit the results
After all this I am still left with one more hoop – how do I get it from this staging area into MasterData, where it can be checked and analysed.
After spending some time pondering this I decided on the following approach:
For each MasterData table – build a view in the Staging area that looks exactly the same as the MD definition
Once the view has been defined we can now treat the movement of data from Staging to MD in the same fashion as the movement of data from MD to any other internal system.
The lovely part about the External ETL system is – all I need to do is define the external linked server and the external table names in the MDDE. Once this has been done the stored procedures will take care of the getting the SQL table that contains the column information. If I need to handle the external source system SELECT using the appropriate Indexes – I can define this index on the mirrored SQL table in the staging area.