posted 7/17/2012 by BradSchacht - Views: [818]
If you are using Change Data Capture (CDC) it can be very useful to look at the CDC State table to understand what is happening. I am writing this in the context of using SSIS 2012 CDC components; in this case the CDC Control Task. The CD State can be very helpful to determine why data isn’t getting loaded right after an initial load for instance, which is when I decided to dig into the state shown in the state table.
The CDC state is used to maintain the state of the loading process. It is used to determine and maintain the processing range when using the CDC components in SSIS 2012. The storage location of the state is indicated as an option in the CDC Control Task and looks similar to this:
The part of the super simple to decipher string you are wanting to look at is going to be the first part of the state before the “/CS”. There are 7 distinct values that may be shown at the beginning of the state field.
Here is what the problem was for me. I set up an initial load package:
Then I set up a second package to load updates, and it pulled in NO CHANGES!!!! What the HECK!!!!
The issues was if you run an initial load end due to the loading methods of the initial data set CDC thinks that the source database is currently in use (which in my case it was not, it was a static database). So when I ran the update package it went into ILUPDATE mode to pull in whatever updates occurred during the initial load. It went something like this (obviously this is just to recreate what happened, it was two separate packages with data flows between the first and second tasks as well as between the third and fourth tasks):
After running the packages marking the processed range put the status back to ILEND. This again caused the get processing to issue a status change to ILUPDATE on the next run at which point the mark processed finally went to TFEND like I expected. In hind sight this makes sense as it is the way that CDC ensures it doesn’t miss any records that were being changed during the initial load, but it was not what I was expecting to see. All subsequent runs result in a TFSTART and TFEND.
Since my database source was static I could have just issued a Mark CDC start command and been on my way as shown here:
To sum up the difference in how those two situations acted here is what when on.
Scenario 1 Setup:
Initial StartInitial EndGet ProcessingMark Processed
Scenario 1 Outcome:
ILSTARTILENDILUPDATEILENDILUPDATETFENDTFSTARTTFENDRepeat last two outcomes
Scenario 2 Setup:
CDC StartGet ProcessingMark Processed
Scenario 2 Outcome:
TFENDTFSTARTTFENDTFSTARTTFENDRepeat outcome
Hopefully this explains in somewhat useful terms what happens in each of those two situations. The best way to understand this is to set up both those demos on your own and run through them. You don’t even need to create a data flow in there, you will get great benefit out of just running the different options in the control task by itself. Just be sure to only run the initial start and initial end once to start the process. Running those every time is going to be useless and simulate nothing.