Change Date Capture (CDC) using Debezium
What is CDC?
- Change Data Capture, CDC is a software process that identifies and tracks changes done to a database and extracts those changes in a manner that they can be replicated to downstream systems.
- A few use cases for CDC, but are not limited to are -
Invalidating a cache (An event driven cache invalidation by capturing change events in logs and processing them.)
Real time data loading to Data Warehouse. (from OLTP systems to OLAP to run analytical queries)
For maintaining an audit log
Different Approaches for CDC
Log Based CDC
Depends on the binary log of source databases - impact on source systems is low
Many tools are available in the market — Keboola, Oracle Golden Gate, Qlik Replicate, FiveTran and etc
Debezium is an open source CDC tool which leverages the power of Kafka
Environment Setup
- Download and unzip the kafka service
- Download and unzip the Debezium-Mysql plugin
- Enable Zookeeper and Kafka using the below commands
sudo systemctl enable zookeeper
sudo systemctl enable kafka
Debezium-Mysql Plugin Setup
- Install MySQL
- MySQL has a binary log (binlog) that records all operations in the order in which they are committed to the database. This includes changes to table schemas as well as changes to the data in tables. MySQL uses the binlog for replication and recovery.
- The Debezium MySQL connector reads the binlog, produces change events for row-level INSERT, UPDATE, and DELETE operations, and for the schema changes. It then emits the change events to Kafka topics, Client applications then read those Kafka topics.
- Download and unzip the debezium-mysql plugin.
Setting up the connector in Debezium-Mysql Plugin
Setting up the worker in Debezium-Mysql Plugin
Starting the Connection
sh connect-standalone.sh /home/kafka/kafka/plugins/debezium-connector-mysql/worker.txt /home/kafka/kafka/plugins/debezium-connector-mysql/connector.txt
Debezium-Postgres Plugin Setup
- Install MySQL
- Debezium uses logical decoding feature available in PostgreSQL to extract all persistent changes to the database in an easy to understand format which can be interpreted without detailed knowledge of the database’s internal state.
- The connector produces a change event for every row-level insert, update, and delete operation that was captured and sends change event records for each table in a separate Kafka topic. Client applications read the Kafka topics.
- Download and unzip the debezium-postgres plugin.
Setting up the connector in Debezium-Postgres Plugin
Setting up the worker in Debezium-Postgres Plugin
Starting the Connection
sh connect-standalone.sh /home/kafka/kafka/plugins/debezium-connector-postgres/worker.txt /home/kafka/kafka/plugins/debezium-connector-postgres/connector.txt
Note:
In Mysql both DDL and DML operations are captured from two different Kafka topics.
But for postgres it only captures DML operation logs.
In Mysql database, when you issue a DDL statement it doesn’t record an event for each pre-existing record that might be altered by the change; instead, databases simply record that a DDL change has occurred and what that DDL change was composed of.
References