Redpanda ✕ Materialize ✕ dbt ✕ Debezium

Daniel Palma
6 min readFeb 22, 2022

After attending the recent Hack Day held by the Materialize team I’ve decided to experiment with their product by creating a CDC pipeline which streams change events from a MySQL databases binlog through Debezium into Redpanda where via dbt we can define our live materialized views which feed a Metabase dashboard.

If that sounds like a lot weird words, it’s because it is. Let’s unpack these technologies and our pipeline step by step.

The code for the example can be found here: https://github.com/danthelion/redpanda-debezium-materialized-dbt

Data ingestion

source: https://redpanda.com/blog/redpanda-debezium/
source: https://redpanda.com/blog/redpanda-debezium/
  • MySQL — Our source Database.
  • Redpanda —A new storage engine, optimized for streaming data.
  • Kafka Connect — component of Apache Kafka that works as a centralized data hub for simple data integration between databases, key-value stores, search indexes, and file systems.
  • Debezium —An open source distributed platform for change data capture.

What is CDC and why is it useful?

Change Data Capture (CDC) is the ideal solution for real-time (or close to it) data streaming from relational DBs (MySQL, PostgreSQL, …) into Data Warehouses (BigQuery, Snowflake, …).

--

--