Recently a customer asked me what really happens in the background when we enable Change Data Capture on a Database/Table and if there is a way to monitor all the changes which happens on the DB after enabling CDC. Honestly I did not know for sure what happens in the background. So I did what I do best. I created a small repro for a small database and a table and used profiler to see what really happens in the background.
Before we get into the specifics of the background operations, lets spend a minute on CDC architecture.
For CDC, the source for tracking the changes is the SQL Server Transaction Logs. As DML operations happen on the Database, they are recorded in the SQL Server transaction, which serves as the input to the change capture process. The change process used Log Reader (same as what Transactional Replication uses) to read the entries from the transaction log file. The capture process than segregates this information based on the tracked tables and put the corresponding entries in the change table associated with the tracked tables. These changes can then be enumerated using the functions provided by CDC.
The first step in enabling CDC is to enable it on the Database. This is done using the system stored procedure “sys.sp_cdc_enable_db”.
This stored procedure needs to be executed in the context of the database on which CDC is to be enabled. The stored procedure does not take any arguments. When you executed the stored procedure the following happen.
- is_cdc_enabled bit in the sys.database catalog view is enabled.
- A new database schema (named CDC) and a new DB user (named CDC) is created in the DB.
- A DDL trigger tr_MScdc_ddl_event is created in the database.
- Creates the following tables in the database
- cdc.captured_columns — Contains the details of all the columns being tracked by CDC at a database level.
- cdc.change_tables — Stores information about the cdc enabled tables and their capture instance. One row for each capture instance.
- cdc.ddl_history — All DDL operations happening on the DB.
- cdc.index_columns — Tracks the columns on which the indexes have been defined. This is done for the indexes defines in the @index_name parameter for the sys.sp_cdc_enable_table. If the index name is null, then the primary key index of the table is used.
- cdc.LSN_timeMapping — Maps the LSN in the cdc.<capture_instance>_CT table to the time the LSN’s were generated.
- The Following CDC related sp’s are created in the database.
Once the database has been enabled for CDC, a table can be configured for change capture. To enable a Table for tracking, the following Stored procedure needs to be used “sys.sp_cdc_enable_table”.
This stored procedure takes about 9 parameters (some are optional) as described below.
@source_schema = N’dbo’
, @source_name = N’test1′
, @role_name = N’CDC_Admin’
, @capture_instance = N’CaptureInst1′
, @supports_net_changes = 1
, @index_name = null
, @captured_column_list = N’ id, name, email ‘
, @filegroup_name = null
, @allow_partition_switch = 1
@source_schema and @source_name are the system names of the schema and the table which would be tracked by CDC.
@role_name is the database role which is used to allow access to the change data. If the role does not exists in the database, the stored procedure would try to create the role. If the user running the SP does not have privileges to create a database role, the sp execution would fail.
@capture_instance is the instance used to track changes for a particular table. This capture instance would be used to create a cdc table in the database to track changes specific to the source table. For example, in the above case, a cdc table called cdc.CaptureInst1_CT is created to track changes specific to the TEST1 table.
@supports_net_changes parameter controls the creation of a CDC function, which can be used to provide net/overall changes to a table. For example, lets assume test1 table has 1 record in it. In a Transaction there are 10 DML operations on the record. The net changes function would report the last changed value for the record.
@index_name: This parameter is used to provide the name of the unique index on the table, in case if there is no primary key or unique index already present in the table. CDC needs to identify each record in the table. This can be used using any unique constraint/primary key constraint on the table. If the index is specified and there is a primary key in the table, then the index columns are given precedence in determine row uniqueness.
@captured_column_list: this is a comma separated list of all the columns which needs to be tracked by CDC.
@filegroup_name indicates the filegroup which is to be used for creating the Change Table associated with the tracked table.
@allow_partition_switch indicated whether Alter table.. Switch partition can be executed on the table or not. The default value is 1.
When this stored procedure is executed, the following operations happen in the back ground.
- A change table associated with the tracked table (in example above test1) is created in the DB. This change table has the name as cdc.<capture_instance>_CT.
- Two functions (depending on whether @supports_net_changes was set to 0 or 1) are created in the database. These functions have the following names
- If this was the first execution of the sp in the DB, it also creates two SQL Agent Jobs, along with the necessary entries in the MSDB database.
- Set the is_tracked_by_cdc bit in the sys.tables catalog view.
At this point, database/table has been setup for change capture using CDC. In the various tables mentioned above, we would have certain metadata information created. For example
Select * from cdc.captured_columns
OBJECT_NAME(object_id) as ChangeTrackingTable,
OBJECT_NAME (source_object_id) as TrackedTable,
Next, I inserted a single record in the Test1 table, and then updated the same record. Then I checked for the entries in my Change Table.
select __$start_lsn, __$seqval, __$operation, __$update_mask, ID, name, email
The first column in the output indicate the LSN (log sequence number) of the commit associated with the transaction. CDC would populate the changes in the Change Table only when the DML transaction commits. If the transaction is open, changes made by the transaction does not reflect in the Change Table.
If in the transaction there are multiple DML statements, then they are tracked using the __$seqval, which provides the sequential id for the operations in the transaction.
The next column __$operation provides the type of the operation which was recorded. For Insert and delete operations the operation is straight forward (values of 1 for delete and 2 for insert) and show up accordingly in the change table. But in case of Update, the operation is recorded as an before and after change values. The __$operation value of 3 indicates an update operation with the data before changes, while a value of 4 indicates the update operation with the column values after the change.
The __$update_mast is a bit mask for the tracked columns in the source table. For example if 3 columns are being tracked by CDC, then the bit mask has 3 bits indicating the 3 columns. The bits are arranged in the order of the column ordinals in the table.
For example, assume we have a table test1 with 5 columns (a, b, c, d, e). We have enabled CDC for this table on the following columns (a,b,e).
Then the bit mask is created with 3 bits. The right most bit indicates first column (i.e. column a in our example), while the left most bit indicates the last column (i.e. column e).
Whenever a perticular column has been modified the corresponding bit for that column is set to 1. In the output above, the value of 0x4 (100) indicates that the third column (email) was modified.