The Fast Track to Streaming Analytics with CSP Community Edition

Tim Spann
Cloudera
Published in
6 min readApr 16, 2024

Apache Flink, Apache Kafka, Schema Registry, CDC, Debezium, Postgresql, Docker, Cloudera SQL Stream Builder, Cloudera Streams Messaging Manager, Kafka Connect, SQL, JSON, AVRO

The Cloudera Stream Processing Community Edition includes these capabilities of Cloudera Streams Messaging and Cloudera Streaming Analytics:

  • Apache Flink
  • Cloudera SQL Stream Builder
  • Apache Kafka & Kafka Connect
  • Cloudera Streams Messaging Manager
  • Cloudera Schema Registry

The first step is to follow along with the documentation.

This will get you the download for CSP Community Edition and get started.

Once you have completed that, then come back and let’s get rolling a little more in depth.

We are going to build a Change Data Capture application using the Postgresql database that comes with the Community Edition.

Add videos per section: TODO TODO

My Live Cloudera SQL Stream Builder Community Repo

Debezium Tip (Postgresql Debugging)

SELECT * FROM pg_replication_slots

Access The Systems in Docker

Schema Registry
http://localhost:7788/swagger
http://localhost:7788/ui/#/

SQL StreamBuilder
http://localhost:18121/ui/login

http://localhost:18121/swagger-ui/index.html?configUrl=/swagger/api-docs/swagger-config

SMM/Kafka
http://localhost:9991/
http://localhost:8585/swagger

Docker Statements

docker ps -a --format '{{.ID}}\t{{.Names}}' --filter "name=kafka.(\d)" --filter "name=postgres"
docker ps -a --format '{{.ID}}\t{{.Names}}' --filter "name=kafka-connect"

docker exec -it downloads-postgresql-1 /bin/bash
docker exec -it downloads-kafka-1 /bin/bash

Apache Flink Dashboards

SQL StreamBuilder IDE

SMM

Kafka Connect

SMM Alerts

Kafka Examples

Logs Example

CDC Example

Schema Registry

Docker Metrics

Github Project

Use Case: Change Data Capture (CDC) with Flink SQL from Postgresql

For an additional example: https://github.com/tspannhw/FLaNK-CDC

CUSTOM UDFS in Javascript or Java

RESOURCES

--

--

Tim Spann
Cloudera

Principal Developer Advocate, Zilliz. Milvus, Attu, Towhee, GenAI, Big Data, IoT, Deep Learning, Streaming, Machine Learning. https://www.datainmotion.dev/