SQLUNINTERRUPTED

I am just a medium, SQL Server the Goal

Change Data Capture – We don’t see, what we don’t know

Recently a customer asked me what really happens in the background when we enable Change Data Capture on a Database/Table and if there is a way to monitor all the changes which happens on the DB after enabling CDC. Honestly I did not know for sure what happens in the background. So I did what I do best. I created a small repro for a small database and a table  and used profiler to see what really happens in the background.

Before we get into the specifics of the background operations, lets spend a minute on CDC architecture.

For CDC, the source for tracking the changes is the SQL Server Transaction Logs. As DML operations happen on the Database, they are recorded in the SQL Server transaction, which serves as the input to the change capture process. The change process used Log Reader (same as what Transactional Replication uses) to read the entries from the transaction log file. The capture process than segregates this information based on the tracked tables and put the corresponding entries in the change table associated with the tracked tables. These changes can then be enumerated using the functions provided by CDC.

The first step in enabling CDC is to enable it on the Database. This is done using the system stored procedure “sys.sp_cdc_enable_db”.

This stored procedure needs to be executed in the context of the database on which CDC is to be enabled. The stored procedure does not take any arguments. When you executed the stored procedure the following happen.

  1. is_cdc_enabled bit in the sys.database catalog view is enabled.
  2. A new database schema (named CDC) and a new DB user (named CDC) is created in the DB.
  3. A DDL trigger tr_MScdc_ddl_event is created in the database.
  4. Creates the following tables in the database
    • cdc.captured_columns — Contains the details of all the columns being tracked by CDC at a database level.
    • cdc.change_tables — Stores information about the cdc enabled tables and their capture instance. One row for each capture instance.
    • cdc.ddl_history — All DDL operations happening on the DB.
    • cdc.index_columns — Tracks the columns on which the indexes have been defined. This is done for the indexes defines in the @index_name parameter for the sys.sp_cdc_enable_table. If the index name is null, then the primary key index of the table is used.
    • cdc.LSN_timeMapping — Maps the LSN in the cdc.<capture_instance>_CT table to the time the LSN’s were generated.
  5. The Following CDC related sp’s are created in the database.
    • cdc.sp_batchinsert_lsn_time_mapping
    • cdc.sp_ins_dummy_lsn_time_mapping
    • cdc.sp_ins_instance_enabling_lsn_time_mapping
    • cdc.sp_ins_lsn_time_mapping

Once the database has been enabled for CDC, a table can be configured for change capture. To enable a Table for tracking, the following Stored procedure needs to be used “sys.sp_cdc_enable_table”.

This stored procedure takes about 9 parameters (some are optional) as described below.

EXEC sys.sp_cdc_enable_table
    @source_schema = N’dbo’
  , @source_name = N’test1′
  , @role_name = N’CDC_Admin’
  , @capture_instance = N’CaptureInst1′
  , @supports_net_changes = 1
  , @index_name = null
  , @captured_column_list = N’ id, name, email ‘
  , @filegroup_name = null
  , @allow_partition_switch = 1
GO

@source_schema and @source_name are the system names of the schema and the table which would be tracked by CDC.

@role_name is the database role which is used to allow access to the change data. If the role does not exists in the database, the stored procedure would try to create the role. If the user running the SP does not have privileges to create a database role, the sp execution would fail.

@capture_instance is the instance used to track changes for a particular table. This capture instance would be used to create a cdc table in the database to track changes specific to the source table. For example, in the above case, a cdc table called cdc.CaptureInst1_CT is created to track changes specific to the TEST1 table.

@supports_net_changes parameter controls the creation of a CDC function, which can be used to provide net/overall changes to a table. For example, lets assume test1 table has 1 record in it. In a Transaction there are 10 DML operations on the record. The net changes function would report the last changed value for the record.

@index_name: This parameter is used to provide the name of the unique index on the table, in case if there is no primary key or unique index already present in the table. CDC needs to identify each record in the table. This can be used using any unique constraint/primary key constraint on the table. If the index is specified and there is a primary key in the table, then the index columns are given precedence in determine row uniqueness.

@captured_column_list: this is a comma separated list of all the columns which needs to be tracked by CDC.

@filegroup_name indicates the filegroup which is to be used for creating the Change Table associated with the tracked table.

@allow_partition_switch indicated whether Alter table.. Switch partition can be executed on the table or not. The default value is 1.

When this stored procedure is executed, the following operations happen in the back ground.

  1. A change table associated with the tracked table (in example above test1) is created in the DB. This change table has the name as cdc.<capture_instance>_CT.
  2. Two functions (depending on whether @supports_net_changes was set to 0 or 1) are created in the database. These functions have the following names
    1. cdc.fn_cdc_get_all_changes_<capture_instance_name>
    2. cdc.fn_cdc_get_net_changes_<capture_instance_name>
  3. If this was the first execution of the sp in the DB, it also creates two SQL Agent Jobs, along with the necessary entries in the MSDB database.
    • cdc.ChangeDataCapture_capture
    • cdc.ChangeDataCapture_cleanup
  4. Set the is_tracked_by_cdc bit in the sys.tables catalog view.

At this point, database/table has been setup for change capture using CDC. In the various tables mentioned above, we would have certain metadata information created. For example

Select * from cdc.captured_columns

image

select
OBJECT_NAME(object_id) as ChangeTrackingTable,
OBJECT_NAME (source_object_id) as TrackedTable,
capture_instance,
supports_net_changes,
role_name,
index_name,
filegroup_name,
partition_switch
from cdc.change_tables

image

Next, I inserted a single record in the Test1 table, and then updated the same record. Then I checked for the entries in my Change Table.

select __$start_lsn, __$seqval, __$operation, __$update_mask, ID, name, email

from cdc.CaptureInst1_CT

image

The first column in the output indicate the LSN (log sequence number) of the commit associated with the transaction. CDC would populate the changes in the Change Table only when the DML transaction commits. If the transaction is open, changes made by the transaction does not reflect in the Change Table.

If in the transaction there are multiple DML statements, then they are tracked using the __$seqval, which provides the sequential id for the operations in the transaction.

The next column __$operation provides the type of the operation which was recorded. For Insert and delete operations the operation is straight forward (values of 1 for delete and 2 for insert) and show up accordingly in the change table. But in case of Update, the operation is recorded as an before and after change values. The __$operation value of 3 indicates an update operation with the data before changes, while a value of 4 indicates the update operation with the column values after the change.

The __$update_mast is a bit mask for the tracked columns in the source table. For example if 3 columns are being tracked by CDC, then the bit mask has 3 bits indicating the 3 columns. The bits are arranged in the order of the column ordinals in the table.

For example, assume we have a table test1 with 5 columns (a, b, c, d, e). We have enabled CDC for this table on the following columns (a,b,e).

Then the bit mask is created with 3 bits. The right most bit indicates first column (i.e. column a in our example), while the left most bit indicates the last column (i.e. column e).

Whenever a perticular column has been modified the corresponding bit for that column is set to 1. In the output above, the value of 0x4 (100) indicates that the third column (email) was modified.

Advertisements

10 responses to “Change Data Capture – We don’t see, what we don’t know

  1. bhargavi July 26, 2012 at 12:25

    nice article sir but internally cdc calling so many system stored procedures i am not getting exact flow of that procedures .while enabling
    cdc its calling sys.sp_cdc_enable_table procedure inside this its calling another sys.sp_cdc_enable_table_internal procedure i could not find this procedure in any database .how its calling cdc.sp_batchinsert_lsn_time_mapping
    cdc.sp_ins_dummy_lsn_time_mapping
    cdc.sp_ins_instance_enabling_lsn_time_mapping
    cdc.sp_ins_lsn_time_mapping these procedures sir .i want how cdc tracking changes using system procedures .iam not getting when changes occured how its calling cdc.sp_batchinsert_lsn_time_mapping
    cdc.sp_ins_dummy_lsn_time_mapping
    cdc.sp_ins_instance_enabling_lsn_time_mapping procedures ? please give exact flow of system stored procedures how its tracking and how it is calling sir ?

  2. bhargavi July 26, 2012 at 12:26

    please give reply soon

    • Sourabh Agarwal July 26, 2012 at 12:32

      Bhargavi,
      I will write a follow up post in some time. But mainly you dont need to worry about the internal stored procedurees.. they follow a very complex nesting chain.
      In any case, I will try to explain further in my new post.

  3. PT940 July 26, 2012 at 18:34

    Very nice post!
    I have a problem that is bugging me: I’ve enabled CDC on a table that has a VARBINARY(MAX) column.
    When I execute am INSERT or UPDATE on the table with more than 65KB of data for this column, it gives me the error: Length of LOB data (xxxx) to be replicated exceeds configured maximum 65536.
    I have tried enable CDC without the VARBINARY(MAX) column using the parameter @captured_column_list, but it’s still not working.
    Without CDC, I can insert more than 65KB in the column, but with CDC enabled on that table, even if I do not include the VARBINARY(MAX) column on CDC, it gives me the error.
    How can I avoid the problem, keeping CDC enabled for that table without having to change the ‘max text repl size’ server option?
    Best Regards,
    PT940

    • PT940 July 26, 2012 at 20:06

      Solution: create the table without image o varbinary fields, enable CDC, alter the table including those fields.
      At least worked for me…
      Regards

    • Sourabh Agarwal September 10, 2012 at 17:21

      Did you try this option…

      If a published table includes any LOBs, we recommend that you specify a value of TRUE for the @stream_blob_columns parameter of sp_addmergearticle (Transact-SQL). If TRUE is not specified, the entire LOB must be built in memory at the Publisher, which can cause the Publisher to run out of RAM if the LOB is very large.

      http://technet.microsoft.com/en-us/library/ms151206(SQL.105).aspx

  4. bhargavi July 27, 2012 at 13:05

    hi sir ,how to send notifications to C# application when any updates occur in SQL Server 2008
    i need methods other than using triggers

    • Sourabh Agarwal September 10, 2012 at 16:49

      Bhargavi,
      You can use SQL Service broker to send out notifications to your application.

      • Bhargavi November 14, 2012 at 10:43

        i want to send notifications to C# application when DDL and DML changes occured in SQL server database but using ServiceBroker
        we can send notifications to C# Application only for DDL Changes.Is there any method other than using triggers and SQl dependency

        • Sourabh Agarwal November 15, 2012 at 16:12

          You can send notifications for both DDL and DML triggers. My example in the blog talks about DML triggers only. In my example I am using Service Broker and SQL Noticiations.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: