CDC-based Upserts with Debezium, Apache Kafka, and Apache Pinot

How to build a streaming data pipeline to capture MySQL database changes and stream them to Apache Pinot via Debezium and Kafka

Published in

Tributary Data

9 min readJul 26, 2022

Upserting means inserting a record into a database if it does not already exist or updating it if it does exist. Analytics database at the end of a streaming data pipeline can benefit from upserts to maintain the data consistency with the source database.

This article explores a minimal viable setup for a streaming data pipeline that captures changes from MySQL and streams them to Apache Pinot via Debezium and Apache Kafka. You can find several videos on the same topic. But this article gives you a solid blueprint to start building your CDC pipeline at scale.

Why do we need upserts?

A real-time analytics system consists of several sub-systems working together to derive insights from events flowing through them.

Change data capture (CDC) tools such as Debezium capture changes in transactional databases, transform them as events and streams them…

CDC-based Upserts with Debezium, Apache Kafka, and Apache Pinot

How to build a streaming data pipeline to capture MySQL database changes and stream them to Apache Pinot via Debezium and Kafka

Why do we need upserts?

Written by Dunith Danushka