Using Transactional Replication as an ETL process

Who is online?  0 guests and 0 members
Home  »  Forums   »  microsoft business intelligence   »  data warehouse design   » Using Transactional Replication as an ETL process

Using Transactional Replication as an ETL process

Topic RSS Feed

Posts under the topic: Using Transactional Replication as an ETL process

Posted: 12/19/2011

Jedi Youngling 13  points  Jedi Youngling
  • Joined on: 12/1/2009
  • Posts: 4

I've just started a new job where Transactional Replication is used as an ETL method for loading data from source databases in the Data Warehouse. The source data is prevented from being loaded into the Data Warehouse immediately by changing the schedule of the Distribution Agent so that it only runs at a certain times.

I would like to replace this process with either SSIS packages or Change Data Capture (CDC), but I am facing resistance from various quarters. I think it's a case of "we've always done it this way" and I'm struggling to convince people to change the ETL process. What are folks' thoughts on the best way to approach this?

Thanks

Lempster


Posted: 12/19/2011

Padawan 1437  points  Padawan
  • Joined on: 3/24/2010
  • Posts: 196

I'm going to play a bit of "Devil's advocate" here and ask, "Why would you want to change it if it IS working?"

Don't get me wrong there are some good reasons to change it, but from the employer's point of view, they are probably going, "Why go through work, testing, verification, etc - when it works the way it is?" - or something along that line.

A few questions about the system:

  • What does the employer consider the biggest limitation of the current system?
  • Does the structure change frequently in the OLTP system ( which I assume is your publisher )?
  • Are the articles pushing every column of every table?  In other words have they vertically partitioned any of the articles?
  • How many ( if any ) triggers are involved in / affect the replication process?
  • Do you find any identity columns in the published tables?

 

Looking forward to your answers,

Keith Hyer

 


Posted: 12/20/2011

Jedi Youngling 13  points  Jedi Youngling
  • Joined on: 12/1/2009
  • Posts: 4

Hi Keith,

Thanks for your response. You are right in that it is working, but I believe it could work better - faster, less resource intensive, more flexible - by using one of the technologies I mentioned. To answer your questions:

  • The biggest limitations of the current system are (1) the increasing time that it takes for the delayed transactionally replicated articles to be written to the data warehouse, so much so that it is starting to spill over into daytime processing and (2) the fact that with every new release of the (multiple) source OLTP systems, a new snapshot has to be created which impinges hugely on disk space.
  • It depends on what you mean by frequently; the main OLTP source system structure changes quarterly.
  • Not all tables are published, but those that are are published in their entirety.
  • Triggers: none that I have found yet.
  • Yes, there are identity columns and a lot of them are GUIDS (I know, I know, the OLTP database is 3rd party supplied)

A lot of transformations take place on the data that is loaded in to the data warehouse via transactional replication; in my opinion it would be better to do that as close to the source as possible, i.e. within one or more OLEDB Source components of an SSIS package.

CDC is not an option currently as the shop is SQL Server 2005, but an upgrade to 2008 R2 is planned for the first quarter of 2012.

Thanks and regards

Lempster


Posted: 1/11/2012

Jedi Youngling 13  points  Jedi Youngling
  • Joined on: 12/1/2009
  • Posts: 4

@Keith,

Any further thoughts?

Regards

Lempster


Page 1 of 1 (4 items)