PostgreSql Db Change Data Capture (CDC) Using Debezium

Caner Tosuner
hepsiburadatech
Published in
5 min readMar 27, 2020

It’s clear that all kind of data is very important for today’s software environment. Especially end-user oriented data helps us to develop new business-models and we see that data transfer is a continues process.

For example, when a payment transaction is processed in an e-commerce web-site, you may need to transfer this data to 3 or 4 different sources at the same time (different teams :fraud-erp-datawarehouse-customer etc.) and this process has to be instantly which means at most 1–2 seconds. Ok let’s explain it more technically; you are responsible for all payment-process and you have to notify different sources about any kind of CRUD operation in the transaction table which is stored in postgresql database. If you have an event-driven architecture, you can publish events by setting up a kind of federation with the sources , but it’s a kind of codbase solution and we don’t want to continue with this solution for a while.

If we look at the suggested methods as a solution to such problems is that Change Data Capture (CDC) approach is one of them.

In databases, change data capture (CDC) is a set of software design patterns used to determine (and track) the data that has changed so that action can be taken using the changed data.

In this article, we will be designing an architecture to track all CRUD operations of a table stored in PostgreSql database using Debezium then we will be publishing each of the them as a message to Kafka topic and at the end these messages will be consumed by an applicaton.

PostgreSQL — Kafka connect provides many kinds of connector for the data-sources and postgresql-connector is one of them. There will be a database and table to stream it’s changes using the connector.

Apache Kafka - We will create a topic and use it to store the events which are tracked from the database.

Kafka Connect — A tool used to ensure scalable and reliable data transfer between Apache Kafka and other systems. It is used to define connectors that can transfer records from databases to Kafka and others.

Debezium Connector — Debezium is an open source tool for distributed change data capture (CDC) platform. It enables to stream the events from database to the topic defined in Kafka using Kafka Connect API.

Let’s start with docker-compose file. The docker-compose.yml file will pull the all necessary images and will create the instances of kafka, zookeper, postgres and debezium.

After creating the file, run this magic command to up all.

docker-compose up

You can easily see all the images are downloading

When you finished all the installations, we can see the containers are up and running using docker-ps command.

PostgreSql

First we get-into postgres container using docker exec command and define appuser to create the Payment database and the table named Transaction.

If we successfully run all the commands, we created a database called payment and a table transaction then we can insert a record as an example.

Debezium

Now we want to track and transfer every changes in transaction table to Kafka topic and we need to create a connector to do this. Connector is an application responsible for moving changed-data from database (or another storage system) to Kafka cluster (or vice versa). If you are not familiar with the Kafka connector, you can find more information here. Here we want to move transaction table CRUD operations to Apache Kafka cluster.

Debezium is a Kafka connector that can read all change events from PostgreSql and publish them into a topic in Kafka.

Debezium provides a restApi for configuration and we can easily configure it with a curl command as below.

We have defined a connector named payment-connector and specified which schema and database to track. If we run the curl localhost:8083/connectors/paymentconnector/status command, you can see that the connector is in the RUNNING state.

Now let’s get-into kafka container and list all the topics.

docker exec -it b06d7ea41b03 bashkafka-topics --zookeeper zookeeper:2181 --list

When you run that command you see that there is a topic named dbserver1.public.transaction which is created by the connector and all the messages created from transaction table will be published to this topic by connector.

It’s time to consume the messages using a consumer. To do that we need to create a consumer using this command;

kafka-console-consumer --bootstrap-server kafka:9092 --from-beginning --topic dbserver1.public.transaction --property print.key=true --property key.separator="-"

Then connect to payment database using a different terminal, let’s insert a record to transaction table, then update and delete it. When you look at the consumer application all the messages will be displayed on the console.

insert into transaction(id, amount,customerId) values(85, 87,'37b920fd-ecdd-7172-693a-d7be6db9792c');update transaction set amount=77 where id=85

If we look at the consumed messages; it creates output as JSON format and it looks a little bit complicated. Using a json beautifier payloads will look like;

Insert payload

Update payload

Delete payload

As I mentioned at the beginning of the article, there are different techniques to do this and Debezium is just one of them. Using the CDC technique, we can instantly stream the changes from database to Kafka topic and then consume these messages using a consumer application after that you can run your own business rules using different frameworks.

--

--