- Tools for Event-Driven DevOps with Apache Kafka
Update 2024 — Jan.
Testing Kafka is hard — we have taken the well-used, and somewhat old kafka datagen and updating for SpecMesh testing, find updates here:
https://github.com/specmesh/kafka-random-generator
Update 2023.
This post is now three years old. A lot has changed since then. I left Confluent, transitioned into freelancing, worked at a hedge fund, and have been in and out of more than 20 organizations that use Kafka, MSK, and CCloud. Regrettably, JulieOps, which had been gaining traction, is no longer in active development. Building on the experiences from large strategic engagements, platform build-outs, and other interesting projects, I, along with two others — Sion and Andy, started working on specmesh.io (available on GitHub). With complete bias, I believe it’s the right approach for adopting Apache Kafka and avoiding the common mistakes that make this industry somewhat challenging. Please take a look and feel free to reach out.
Original Post
DevOps is a long road; it is a journey. Are you Dev or Ops or both? This is the challenge I have struggled with as DevOps has become a thing. I’m part-way through writing a Kafka DevOps blog post for Confluent and this thought is always at the top of my mind. However, while I grapple with it — I figured I would provide a compiled list of Kafka DevOps tools we see in the wild.
An important point, much like the DevOps definition below, is that Kafka DevOps is not clearly defined; it is what you make of it. Each organization will have its own predispositions and DevOps culture. However, to properly embrace the event-driven architecture, there is a fundamental upheaval of existing concepts. Teams and functionality need to be structured differently, aligned to deliver business functionality, based upon dataflows and related bounded contexts. When one of the key benefits is a 24x7 run-time, and dynamic path execution (dataflow routing); a new feature is enabled using a feature flag; then we are winning. We can avoid the dreaded big bang release that fills you with fear of having to spend the rest of the weekend doing a rollback.
What is DevOps Not? It’s Not NoOps. It’s Not (Just) Tools. It’s Not (Just) Culture. It’s Not (Just) Devs and Ops. It’s Not (Just) A Job Title. It’s Not Everything.
https://theagileadmin.com/what-is-devops/
Viktor Gamov is also working on this — he just doesn’t know it yet ;)
Please leave your preferred tools and opinions in the comments section!
Kafka Specific
- Stream topology driver (Java, Scala), Plus the MockedStreams classes for the Processor API (streams only) (Yeva provides a great introduction)
- TestContainers (Java) for running brokers infra: ZooKeeper, Brokers and SchemaRegistry. Beware of the performance hit when spinning up containers
- Jackdaw (Clojure) — from Funding Circle — written by some very smart people. A great article here
- GoKa(Golang) Kafka Streams Impl
- Nodefluent Kafka Streams (Node) Kafka Streams
- ZeroCode Kafka Testing to produce and consume data (Java)
- Gradle Confluent plugin
KSQL
- KSQL-REST API
- KSQL-CLI worked example with std-in-out
- KSQL testing tool Runs EmbeddedKafka with the topology test driver — fine for unit-tests but won’t support broader testing scenarios)
Infrastructure
- EmbeddedSingleNodeKafka (Java or Scala unit test)
- Debezium Kafka Cluster utils (Java or Scala)
- Confluent docker-compose for broker-infra
- Confluent Ansible
- Confluent Operator
- CNCF Strimzi
- DuckTape python-based, system test driver for Kafka
- MuckRake — system tests on top of DuckTape
- Connector testing using Coyote
- Kafka unit testing by Landoop
- Autoscaling steams (might fit into later stages of a build pipeline)
Schema Registry integration
- Schema Registry plugin for build tools
- Confluent’s Maven Plugin https://docs.confluent.io/current/schema-registry/develop/maven-plugin.html
- SR Plugin for Gradle https://github.com/ImFlog/schema-registry-plugin
DataOps/generation
- KafkaCat generic non-JVM producer and consumer for Apache Kafka; think netcat for Kafka — by Magnus
- Kaf (like KafkaCat — Kafka CLI inspired by kubectl & docker)
- KSQL (Community license)
- Lenses SQL
- EventSim for data generation of simulated data
- Kafka Avro datagen simulation tool
General
- Unit testing shell scripts with shunit2.sh: https://www.leadingagile.com/2018/10/unit-testing-shell-scriptspart-two/ (part 1 is interesting)
- Shunit2 with Maven : https://leelevett.wordpress.com/2015/07/21/bash-script-project-with-tests-and-maven/
- Shunit2 with TravisCI : https://www.sysorchestra.com/travisshunit2/
- Sample github project that uses TravisCI to test a .sh script : https://github.com/martinseener/example-shunit2-travisci AND https://github.com/soulseekah/test-shunit2-travis
- TravisCI support for Docker : https://blog.travis-ci.com/2015-08-19-using-docker-on-travis-ci/
What’s missing
Quite a lot of stuff is missing, but that’s because most environments will be geared towards a particular deployment and automation runtime.
- Data tooling — topic provisioning, topic pre-population (from source or flat file), topic-assertion-validation
- Infrastructure automation tools (Ansible, Helm, Helm charts, Operators)
- Schema/quota/governance automation via APIs and assertions
- Security/ACL automation (Very important)
Resources
- DZone: A quick and practical example of Kafka testing.
- Fluent Kafka Streams
- Deploying Kafka Streams with KSQL and Gradle — UDFs and more
Want to learn more?