Change Date Capture (CDC) using Debezium

Emmanuel
Litmus7 Systems Consulting
4 min readFeb 27, 2023

What is CDC?

  • Change Data Capture, CDC is a software process that identifies and tracks changes done to a database and extracts those changes in a manner that they can be replicated to downstream systems.
  • A few use cases for CDC, but are not limited to are -

Invalidating a cache (An event driven cache invalidation by capturing change events in logs and processing them.)

Real time data loading to Data Warehouse. (from OLTP systems to OLAP to run analytical queries)

For maintaining an audit log

Different Approaches for CDC

Log Based CDC

Depends on the binary log of source databases - impact on source systems is low

Many tools are available in the market — Keboola, Oracle Golden Gate, Qlik Replicate, FiveTran and etc

Debezium is an open source CDC tool which leverages the power of Kafka

Environment Setup

  • Download and unzip the kafka service
  • Download and unzip the Debezium-Mysql plugin
  • Enable Zookeeper and Kafka using the below commands

sudo systemctl enable zookeeper

sudo systemctl enable kafka

Debezium-Mysql Plugin Setup

  • Install MySQL
  • MySQL has a binary log (binlog) that records all operations in the order in which they are committed to the database. This includes changes to table schemas as well as changes to the data in tables. MySQL uses the binlog for replication and recovery.
  • The Debezium MySQL connector reads the binlog, produces change events for row-level INSERT, UPDATE, and DELETE operations, and for the schema changes. It then emits the change events to Kafka topics, Client applications then read those Kafka topics.
  • Download and unzip the debezium-mysql plugin.

Setting up the connector in Debezium-Mysql Plugin

connector.txt

Setting up the worker in Debezium-Mysql Plugin

worker.txt

Starting the Connection

sh connect-standalone.sh /home/kafka/kafka/plugins/debezium-connector-mysql/worker.txt /home/kafka/kafka/plugins/debezium-connector-mysql/connector.txt

Debezium-Postgres Plugin Setup

  • Install MySQL
  • Debezium uses logical decoding feature available in PostgreSQL to extract all persistent changes to the database in an easy to understand format which can be interpreted without detailed knowledge of the database’s internal state.
  • The connector produces a change event for every row-level insert, update, and delete operation that was captured and sends change event records for each table in a separate Kafka topic. Client applications read the Kafka topics.
  • Download and unzip the debezium-postgres plugin.

Setting up the connector in Debezium-Postgres Plugin

connector.txt

Setting up the worker in Debezium-Postgres Plugin

worker.txt

Starting the Connection

sh connect-standalone.sh /home/kafka/kafka/plugins/debezium-connector-postgres/worker.txt /home/kafka/kafka/plugins/debezium-connector-postgres/connector.txt

Note:

In Mysql both DDL and DML operations are captured from two different Kafka topics.

But for postgres it only captures DML operation logs.

In Mysql database, when you issue a DDL statement it doesn’t record an event for each pre-existing record that might be altered by the change; instead, databases simply record that a DDL change has occurred and what that DDL change was composed of.

References

--

--