Transaction Log Tailing With Debezium — Part 2

Abdullah YILDIRIM
Trendyol Tech
Published in
6 min readMar 9, 2020

In the previous post of this post series, I explained what Transactional Log Tailing and Debezium are and why we need to use them. In this post, I will give information about how to install Debezium and what configuration settings should be made. I will be advancing the pot into two sections. The first of these sections will be about what steps we need to follow when we try to install Debezium in our local environment. In the other section, I will explain how we deploy Debezium to Kubernetes clusters for use in the Prod environment as the Seller Core team in Trendyol.

Installing Debezium on Local Docker

Before starting the installation, make sure that Docker is installed and running on your operating system.

We will need 3 different service setups to run Debezium. These are Zookeeper, Apache Kafka and Debezium Connect services. Zookeeper and Kafka keep the data in the container. Therefore, its directories must be mounted to a volume on the host machine. We will not apply this structure to this post. Therefore, if the docker images are deleted, all data will be lost.

Getting started with ZooKeeper

Let’s open a new terminal and run the following command to start a new container with ZooKeeper inside.

After running this command, you should see an output of ZooKeeper similar to the one below.

Then open a new terminal and run the following command to start a new container with Kafka inside.

Again, you should see Kafka’s output similar to the one below. The last line shows that Kafka started successfully.

Now there is the PostgreSQL database. To create the PostgreSQL database, we will use the ‘debezium/postgres:11’ image, not the classic ‘postgres’ image on DockerHub. I will explain the reason for this later in the post.

Again, you should see PostgreSQL’s output similar to the one below. The last line shows that PostgreSQL started successfully and ready to accept connections.

Finally, launch the Debezium Connect application, which will establish the connection between PostgreSQL and Kafka.

If you saw an output of Debezium Connect similar to the one below, your setup is complete. If you want to do the operations so far with the docker-compose file, you can download it via my GitHub link.

While running this command, I will briefly mention the parameters that I have given as an environment variable.

CONFIG_STORAGE_TOPIC: The topic to store connector and task configuration state in.
OFFSET_STORAGE_TOPIC: The topic to store connector offset state in.
STATUS_STORAGE_TOPIC: The topic to use for storing statuses.

After running the command, I check if there are new topics in Kafka running on Docker.

There are a few PostgreSQL details that you need to know before creating a new connector. First of all, let’s start with why I use the ‘debezium/postgres: 11’ image I mentioned above. Debezium can provide PostgreSQL connection provided that wal_level is ‘logical’. The wal_level of PostgreSQL in this image is logical.

So what are WAL and WAL_LEVEL?

(WAL) The mechanism called Write-Ahead Logging guarantees that the Data Page will not be written to the disk until the transaction log file is written to disk. RDBMS gives the “Successful” information to the application that initiates and commits the transaction when the Transaction Log file is written to disk. Thanks to this feature, WAL provides Atomicity and Durability, which are among the principles of ACID.

Wal_level determines how much information is written to the WAL. The default value is ‘minimal’, which writes only the information needed to recover from a crash or immediate shutdown. ‘replica’ adds logging required for WAL archiving as well as information required to run read-only queries on a standby server. Finally, ‘logical’ adds information necessary to support logical decoding. Each level includes the information logged at all lower levels. This parameter can only be set at the server start.

PostgreSQL’s logical decoding feature was first introduced in version 9.4 and is a mechanism that allows the extraction of the changes which were committed to the transaction log and the processing of these changes via the help of an output plugin. I will use pg_output as the output plugin. This plugin has been maintained by the PostgreSQL community. It can be used in PostgreSQL 10 and above versions.

Now I will create a table for the demo on the PostgreSQL database. Then, I will create a Debezium Connector to listen for transactional logs of this table.

First, I execute the PostgreSQL container on Docker. Then I access the database named ‘debezium’ which I created earlier with the following commands and here I create a table named ‘persons’.

After successfully creating the table, it’s time to create a new Debezium Connector. In order to create Debezium Connector, I send the following curl request to the Debezium application running on Docker.

Note that I have defined plugin.name as ‘pgoutput’ among the parameters. Now I check the status of the Debezium Connector that has been created.

Finally, I need to add a record to the ‘persons’ table and check the CDC.

After successfully adding the record, I check if a topic has occurred in Kafka and whether the CDC event has been sent.

After consuming the topic that occurred in Kafka, I saw that the CDC was working successfully. This is simply the installation of Debezium on Docker.

Installing Debezium on Kubernetes Cluster

In the Trendyol Seller Core team, we deploy our applications to Kubernetes. We operate the Gitlab CI/CD during the deployment process. That’s why we created a sample deployment pipeline for Debezium deployment. You can access this pipeline via my GitHub link. To explain it briefly, all of the above configurations are valid for Kubernetes environments. The only difference here is that we already have a Kafka and PostgreSQL servers.

We have come to the end of this post. In my next post, I will write about what situations we encountered in the process of implementing Debezium and how we found solutions for these situations.

If you have a will and tendency towards this kind of technology, reach us.

References

--

--

Abdullah YILDIRIM
Trendyol Tech

Software Engineer, OOP, Has a Tendency Towards System Programming