Incremental Loads with Change Data Capture – Part 1: Enabling Change Data Capture

Who is online?  0 guests and 0 members
Home  »  Articles  »  Incremental Loads with Change Data Capture – Part 1: Enabling Change Data Capture

Incremental Loads with Change Data Capture – Part 1: Enabling Change Data Capture

change text size: A A A
Published: 12/31/2009 by  PatrickLeBlanc  - Views:  [3314]  

In this series of articles I will show you how to incrementally load a Slowly Changing Dimension with the new SQL Server 2008 feature, Change Data Capture (CDC).  Over the next few articles I will cover the following topics: 

1.       Enabling Change Data Capture

2.        Detecting Changes using Change Data Capture Tables

3.        Loading the Incremental Data into the Slowly Changing Dimension.

Before we begin with the technical details of this article, I would like to provide you will some basic terms that will help you understand the concepts and techniques that will be outlined in these articles.

Terms

Data Warehouse – Is a repository of data that is organized in a manner that addresses the analytics and reporting needs of a business.

Dimension – A dimension, as it pertains to a data warehouse, provides a mechanism for “slicing and dicing” the data in the data warehouse.  Even further, a dimension may also be considered as a means for categorizing the same data.  A dimension typically performs functions such as filtering and grouping data warehouse data.

Slowly Changing Dimension (SCD) – Is a dimension, whose data slowly changes over time.  There are several types of SCDs.  However, in this series of articles I will only focus on Type I and Type II.

Type I SCD – This type of SCD does not keep track of historical data, instead the data is overwritten. 

Type II SCD – This type of SCD does keep track of historical data.  When data in the row is changed, instead of overwriting the data  it is preserved.  A new row with the new values is inserted.  Typically, there is some type of date combination or bit column that will help the user to identify the effective row.  For example, assume the following Customers Table:

CustomerSK

CustomerAK

FirstName

LastName

Company

EffectiveRow

1

150

Patrick

LeBlanc

Lamar

1

 

If the person changed his Company Name from Lamar to Pragmatics a new row would be inserted.  Thus, preserving the historical data and the bit flag on the EffectiveRow for the old row would be set to 0 while the new row bit value would be set to 1.  The resulting data would be as follows:

 

CustomerSK

CustomerAK

FirstName

LastName

Company

EffectiveRow

1

150

Patrick

LeBlanc

Lamar

0

2

150

Patrick

LeBlanc

Pragmatics

1

 

 

Now that you are familiar with some basic warehousing terms, I will delve into the technical aspects of incrementally loading data with Change Data Capture.

Enabling and Disabling Change Data Capture

The first step in this process is to enable CDC.  This is a two step process.  The first step is to enable CDC at the database level.  This can be accomplished two ways with SQL Server 2008, Graphically and Syntactically.   Since the Graphical User Interface is not always available I will defer to enabling CDC using T-SQL.  Run the following script, replacing DatabaseName with your database, in the context of the database that needs CDC enabled:

 

USE DatabaseName

GO

--Enable CDC

EXEC sys.sp_cdc_enable_db

GO

 

--Disable CDC

EXEC sys.sp_cdc_disable_db

 

Please see SQL Server Books Online (BOL) for additional information regarding enabling CDC at the database level.

 

The second step is to enable CDC on the table that you want to detect changes on.  Run the following script:

 

 

USE DatabaseName

GO

 

EXEC sys.sp_cdc_enable_table

    @source_schema = ’dbo’, -- Schema of Source Table

    @source_name = ’Customer’, -- Schema of Source Table

    @role_name = ’cdc_Admin’, -- Controls Access to Change Data

    @supports_net_changes = 1, -- Supports Querying for net changes

    @index_name = ’PK_Customer’, -- Unique Index or Primary Key of Source Table

 

GO

 

 

For the sake of brevity, the above example enables CDC on the Customer table using only the required parameters and two additional parameters.  The support_net_changes enables the support for querying net changes.  If support_net_changes is set to 1, the index_name parameter must be set to the Primary Key or a Unique Index that exists on the source table. Review BOL for additional parameters that are available, which extends the flexibility and functionality of CDC. 

 

In addition to these stored procedure, SQL Server provides a list of functions, system views and additional stored procedures that will assist the DBA in securing, managing and maintain CDC.  See BOL for additional information.  This concludes the first article in this series. In the next article I will discuss how to use the aforementioned database objects and newly created Changed Tables, which will be discussed in detail in the next article, to detect changes on tables that CDC has been enabled.  If you have any questions please feel free to email me at pleblanc@pragmaticworks.com.

 
4.68
/5
Avg: 4.68/5: (1 votes)

Comments (3)

Rob_Randall
Rob_Randall said:
Good idea for an article...looking forward to the next parts. Any ETA?
2/25/2010
 · 
 
by
PatrickLeBlanc
Part 2 should be avaiable in the next month! Thanks.
2/25/2010
 · 
 
by
rchafei
rchafei said:
Excellent post Patrick ---thanks
2/16/2012
 · 
 
by

Most Recent Articles