SQL Server Change Data Capture (CDC) Methods: How Striim Captures Change Data More Than 7X Faster

Table of Contents

Striim SQL Server CDC

In this post we cover the built-in SQL Server CDC functionality, how to set it up, and some pros and cons of this feature. We also introduce Striim’s high-speed, reliable SQL Server CDC solution.

Introduction

SQL Server CDC Methods

Introduction

What is Change Data Capture?

Change Data Capture (CDC) is a software process that identifies and tracks changes in a database. There are various CDC methods, each with their pros and cons, but the main benefit of CDC is that it enables near-real-time data movement from databases. Continuous data capture is important for both analytics use cases (where timely data is used to make decisions) and zero-downtime cloud migrations (where operational databases can’t be taken offline for extended periods).

What is SQL Server? 

Microsoft SQL Server is a relational database management system (RDMS) developed by Microsoft. SQL Server is mainly used to store and retrieve data. For the purposes of analysis and business intelligence, data is typically replicated to a data warehouse optimized for large analytics queries.

Change Data Capture is an ideal solution for replication of SQL Server data in near real-time. In this post, we’ll share an overview of both SQL Server’s built-in CDC functionality and Striim’s high-speed SQL Server CDC reader.

SQL Server CDC Methods

Built-in SQL Server CDC

SQL Server offers a built-in CDC functionality, as depicted in the diagram below.

SQL Server CDC feature
Image source: Microsoft documentation

The source table contains the data that’s being inserted, updated, and deleted by applications. The SQL Server transaction log keeps a record of the changes. If CDC is enabled, a SQL Server agent reads inserts, updates, and deletes from the transaction log. These changes are added to a separate change table, where query functions are used to capture the changes so they can be delivered to a target data warehouse or other destination via a data integration process like ETL.

What are the benefits and drawbacks of the built-in SQL Server Change Data Capture feature?

Pros:

  • Built-in feature, no external tools needed
  • Uses familiar T-SQL language

Cons:

  • Adding a change table adds overhead to the source database
  • Querying change tables adds latency
  • Change tables need to be cleaned up on a regular basis
  • Since a change table is created for each CDC-enabled table, it can be challenging to consolidate changes from multiple tables (and databases)

Tutorial: How to set up the built-in SQL Server CDC functionality

This tutorial describes how to set up change data capture (CDC) in SQL Server using Microsoft’s AdventureWorks sample database to illustrate the steps.

For additional information see SQL Server setup in the documentation.

Prerequisites

The following are required to complete this tutorial: 

  • Existing SQL Server running instance
  • SQL Server management tools
  • (Optional) AdventureWorks sample database

Set up Database-Level CDC

CDC must be enabled both at the database and table level in SQL Server. Database-level CDC functionality is enabled by setting is_cdc_enabled to 1 in the sys.databases table.

Invoke the following commands to check the current state of database-level CDC functionality:

USE AdventureWorksDW2019
GO
SELECT
database_id,
name,
is_cdc_enabled
FROM sys.databases
GO

Locate your database (e.g., AdventureWorks2019) in the resultset. If is_cdc_enabled is 0 as shown below, then CDC is not currently enabled at the database level:

 

Invoke the following commands to enable database-level CDC. These commands use SQL Server’s built-in sys.sp_cdc_enabled_dbd stored procedure to perform the necessary updates:

USE AdventureWorksDW2019
GO
EXEC sys.sp_cdc_enable_db
GO

Re-invoke the verification commands from the start of this section to verify that is_cdc_enabled has is now set to 1:

 

Set up Table-Level CDC

In addition to setting up database-level CDC, you must also enable CDC on a per-table basis.

This section describes how to set up and verify table-level CDC on the AdventureWorks database’s DimAccount table.

Invoke the following commands to set up CDC on the DimAccount table. Note the source_schema parameter must be assigned the name of the schema to which the table (DimAccount) belongs, in this case dbo:

USE AdventureWorksDW2019
GO
EXEC sys.sp_cdc_enable_table
@source_schema = N'dbo',
@source_name = N'DimAccount',
@role_name = NULL
GO

Verify that CDC is enabled for the DimAccount table by checking that is_tracked_by_cdc is set to 1 for the database in sys.tables.

Invoke the following command to return all tables listed in sys.tables:

USE AdventureWorksDW2019
GO
SELECT
object_id,
SCHEMA_NAME(Schema_id) As [Schema Name],
name As [Table Name],
is_tracked_by_cdc
FROM sys.tables
GO

Locate the row for the DimAccount table and verify that is_tracked_by_cdc is set to 1:

This process results in the creation of a system table prefixed with cdc. called cdc.dbo_DimAccount_CT that can be seen in Object Explorer and further verified by querying the table.

Invoke the following command to verify that cdc.Dbo_DimAccount_CT can be queried and that a row is returned:

USE AdventureWorksDW2019
GO
SELECT * from cdc.Dbo_DimAccount_CT
GO

You have now successfully configured CDC for a database and table. 

Striim for High Speed, Reliable SQL Server CDC

Striim is a real-time data integration platform that natively performs change data capture by mining database change logs while managing additional metadata. Striim supports enterprise workloads with:

  • high-speed capture (more than 7x faster than SQL Server CDC using change tables — see data below)
  • exactly-once processing (E1P) to prevent data loss or duplication
  • support for long-running transactions
  • migration and replication of data from SQL Server databases to cloud and on-premise data warehouses, data lakes, messaging systems, files, and big data targets
Striim SQL Server CDC
Striim provides reliable, high-speed SQL Server CDC by reading from SQL Server change logs directly

As shown in the diagram above, Striim’s SQL Server CDC Reader doesn’t require the use of a change table. Striim reads the change data directly from the SQL Server transaction logs; performs in-memory transformations, enrichment and analysis; and delivers data to target systems…with millisecond latency.

SQL server cdc is 7x faster with Striim
SQL Server CDC with Striim’s native log reader (“native”) vs Striim’s “classic” reader that reads from SQL Server change tables.

Interested in a personalized walkthrough of Striim’s SQL Server CDC functionality? Please schedule a demo with one of our CDC experts! Alternatively you can  try Striim for free.