How can I manage my Kafka Artefacts?

Published in

Marionete

6 min readMay 10, 2022

When trying to build an automated Kafka platform there is a point where the question “How can I manage my Kafka Artefacts?” comes in.

In this article, we talk about a solution we are currently using in different projects and in future articles we intend to get into more detail about some of its features and implementation.

If you are starting with Kafka, or already have a stable solution, and there is the need to experiment or requests start to arrive to create new topics, even with specific configurations, with different schemas, deploy different connectors for different integrations, multiple ksqlDB queries running, etc. Are you going to deploy them manually via console every time or try to use an API? How are you going to keep up with changes? There must be a way to keep it organized, automated and all other good things.

The answer we got is … JulieOps! Ta daaa!

What is JulieOps?

Formerly known as Kafka Topology Builder, now goes by the name of JulieOps (honouring the writer’s mother — if you want to know more about why it ended named like that click here). It is an open-source tool that helps you automate configurations of topics, connectors, access controls, Confluent Schema Registry and other tools within Kafka (check the official documentation for the full list of supported features), in a centralized way.

Why is this solution an answer?

If you want to take advantage of versioning and “everything-as-code” to help you automate and have control over the cluster state, JulieOps was designed having a GitOps philosophy in mind, so it’s a perfect fit.

What is GitOps you ask?

Down below we leave you a taste but for sure you should look more deeply into it.

GitOps is a paradigm or a set of practices that empowers developers to perform tasks which typically fall under the purview of IT operations. GitOps requires us to describe and observe systems with declarative specifications that eventually form the basis of continuous everything.

JulieOps, started by Pere Urbon-Bayes, Senior Solutions Architect at Confluent, is an active project constantly adding new features and welcoming feature requests and contributions.

Not only compatible with Apache Kafka but it also has some integrations with Confluent Kafka like RBAC, SR, etc.

The CLI tool is available in several formats. You have ready to use docker images, packages for different distros or even have it as a fat jar, besides building it locally yourself.

The necessary files to run this CLI tool are:

a properties file, that will contain all relevant configurations for the tool and for the Kafka cluster generally called topology builder properties file
descriptor files (can be one or more, in YAML or JSON formats), that will contain, as the name suggests, the description of our objects (see example below)

Example Topology File — Example of a descriptor file. You can access more examples here.

So, now we know how to run the tool and we have a declarative, human-readable approach to our configurations. Now what?

Now, we can try it out in an ad hoc way, with our custom configurations.

Example of how to run the tool as a docker image

Is JulieOps for me? What should I think about before?

It's easy to use JulieOps as a modular, auxiliary tool adding it to our pipelines to apply configuration changes, automating the process.

Regarding this, it’s possible to go down different paths, so there is some flexibility to adapt to your context.

You can choose to implement JulieOps within short-lived containers (for resource optimisation since it is simply a one-time operation) or persistent, long-lived instances to continually apply configurations as they change.

Some other questions that are pertinent when trying to scale out your configurations :

Will you gather all information in one file or split it into several?
Will you have one configuration file for each team? For each topic/connectors or any other point?
How will you approach the file structure within your repo? Will you have only one repo or several?
Will you separate files by project? Teams? Use cases?
You can also consider having several descriptor files or keep a single file as the source of truth (the original idea you can see in the repository but for larger more complex projects can be difficult to manage and be error-prone).

The strategy you choose to manage your files (either the topology builder properties file or/and descriptor file(s)) can affect performance and costs. How much you care for that will depend on your use cases.

Will you need additional steps in your pipelines to, for example, flatten your files if you have several folders or directories levels?
Will you copy all the files to your container? Or will you have a dedicated shared volume, where you have your repo cloned already or files ready to use, already attached?
Will you have secrets in your configuration file? How will they be handled within your pipelines?

All of this and more should be considered.

How can we leverage JulieOps to help us solve problems?

“OK. I now have a well well-identified, structured plan for my files. I have a well oiled, automated pipeline that will identify, deploy and apply all the changes in my source of truth.”

With JulieOps, we can also have automated naming control, by design.
It is also possible to add dry-runs and default validations as well as have your own custom validations. This can be very valuable for testing steps in your pipelines.

Using JulieOps as a tool and building a robust pipeline around it, we can easily configure new topics or change existing ones, making this kind of operation as simple as doing a commit to a repository. We can add new connectors, schemas, ksqlBD queries, add ACLs, and more.

One of the project motivations is to provide autonomy to the teams while keeping operational control.

But with great power comes great responsibility and strong policies around your repositories should be applied to remain in control and prevent error prune events to occur.

So also adding several levels of security and access control to your repositories, branches, PRs, commits, etc. is something you should keep very present in your mind.

As you can see, tools like JulieOps work great in parallel with your Kafka Cluster and can really help to automate operational tasks and keep control, if you have good practices around it.

Do you think this kind of tool has its own space or room to grow?

How to implement this with more detailed examples is a topic for a whole new article (please follow and pay attention to future articles that will explain in a more comprehensive way how to implement some of these configurations) but there is a lot more you can explore by yourself and have fun with☺!

How can I manage my Kafka Artefacts?

What is JulieOps?

Why is this solution an answer?

What is GitOps you ask?

Is JulieOps for me? What should I think about before?

How can we leverage JulieOps to help us solve problems?

Links

Julie Ops Repository

GitHub - kafka-ops/julie: A solution to help you build automation and gitops in your Apache Kafka…

NOTE: This project was formally known as Kafka Topology Builder, old versions of this project can still be found under…

Julie Ops Official Documentation

Welcome to JulieOps's documentation! - Julie Ops documentation

Welcome to JulieOps documentation, in this site we recollected notes and guides to provide the beginners, but as well…

Medium Article by the Author — Pere Urbón-Bayes

Automated Kafka ops with Kafka Topology Builder and Gitops

One of the common questions when teams start working with technologies like Apache Kafka is how to automate and…

Podcast with the Author — Pere Urbón-Bayes

Automating DevOps for Apache Kafka and Confluent ft. Pere Urbón-Bayes

Autonomy is key in building a sustainable and motivated team, and this core principle also applies to DevOps. Building…

Written by Bruno Costa