Redpanda ✕ Materialize ✕ dbt ✕ Debezium
After attending the recent Hack Day held by the Materialize team I’ve decided to experiment with their product by creating a CDC pipeline which streams change events from a MySQL databases binlog through Debezium into Redpanda where via dbt we can define our live materialized views which feed a Metabase dashboard.
If that sounds like a lot weird words, it’s because it is. Let’s unpack these technologies and our pipeline step by step.
The code for the example can be found here: https://github.com/danthelion/redpanda-debezium-materialized-dbt
Data ingestion
- MySQL — Our source Database.
- Redpanda —A new storage engine, optimized for streaming data.
- Kafka Connect — component of Apache Kafka that works as a centralized data hub for simple data integration between databases, key-value stores, search indexes, and file systems.
- Debezium —An open source distributed platform for change data capture.
What is CDC and why is it useful?
Change Data Capture (CDC) is the ideal solution for real-time (or close to it) data streaming from relational DBs (MySQL, PostgreSQL, …) into Data Warehouses (BigQuery, Snowflake, …).