Stream Data From Yugabyte CDC to AWS MSK using Debezium

Change Data Capture is a mechanism to track changes made in a database. Yugabyte Database has recently added CDC feature to their latest release 2.13. In this article, we’ll learn to configure Yugabyte CDC and stream data into AWS MSK using Debezium connector.

It’s assumed that the readers have an elementary knowledge of AWS, Apache Kafka , and CDC.

Let’s start now with the setup

  1. Configuration of IAM Roles and Policies

Create a new role with the required accesses to AWS services. For demo, we’ll name it as “yb_cdc_kafka_role”. The Trusted entities should be configured as below.

The IAM roles and Policies defined below are generic and can be fine-tuned based on your organization’s IT policies

Create a policy with access to the following AWS services.

  1. Apache Kafka APIs for MSK
  2. EC2
  3. MSK Connect
  4. S3
  5. CloudWatch

2. Enable CDC on Yugabyte Database

Ensure that your Yugabyte Database is up and running . To install yugabyte on your cloud virtual machine, please refer to

Create a test table on Yugabyte database within Public schema.

Enable CDC through yb-admin .Below command will enable CDC on all the schemas and tables sitting under the Yugabyte database.

If you have a multi-node yugabyte setup, then you need to provide a Comma-separated list of host:port values of both the leader and the follower nodes as master_address argument.

A successful operation of the above command returns a message with a DB stream ID:

CDC Stream ID: 90fe97d59a504bb6acbfd6a940

For more details on CDC commands, please refer to

3. Configuration of AWS Security Group

Create a Security Group with inbound and outbound rules configured to ensure access to MSK cluster and Yugabyte DB . For demo, we’ll enable incoming traffic from all the ports.

4. Upload Debezium connector Jar file onto S3 bucket

Download Yugabyte Debezium connector jar from and upload it onto an S3 bucket.

5. Configuration AWS MSK cluster

In this example, we’re creating AWS MSK cluster under same VPC as that of Yugabyte Cluster . Please note that this is a generic configuration , it might differ based your organizational IT policy.

For demo, we have created cluster with two zones only.

Under Networking Section, select VPC and Private subnets same as that of Yugabyte Cluster . Choose the security group created in step 3 from the drop down list.

Enable logging on your cluster to ease debugging . In this demo, we are using S3 bucket to store the logs.

The cluster is now is now configured successfully.

6. Configuration of AWS MSK connector

Now that your AWS MSK cluster is ready , it’s time to create a connector to stream data from Yugabyte database onto MSK cluster.

First, create a plugin to access the jar stored in S3 bucket

Select the MSK cluster created in the earlier step.

Configure the connector as per the below details. In this step we are providing CDC stream ID and Database details.

Select the role “yb_cdc_kafka_role” created earlier .

Your connector is now configured.

To read the messages through the configured MSK cluster, you can setup Apache Kafka on an EC2 client machine and consume messages generated through test topic.

Features and Limitations:

  1. Currently, the CDC feature is available for YSQL tables only.
  2. It’s advised not to drop or truncate tables with CDC enabled. It might crash or disrupt the functioning of the database.
  3. All the tables within the database should have Primary Key defined. Even the ones that are not listed in MSK connector configuration.
  4. Users need to re-create CDC stream id to include a new table to its CDC watch list.

References :





Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store