CodeX
Published in

CodeX

CODEX

Kafka Deployment Options in Practice

In this post, I will try to provide you some options to have an overview of how people in real projects do with Kafka. Just btw, if you think of setting up a server by simply downloading Zookeeper and Broker, installing onto one machine, running some “hello world” as I did, then the next step you can do is to try with a couple of machines for different instances, or even a bit more advance, using Docker. You will encounter some challenges but then at least get some hands-on experience to become a beginner of Kafka.

In a real project, many factors will decide how Kafka should be deployed. For example, it can be the cloud platform your project infrastructure is relying on or the container-orchestration system like Kubernetes. In this post, I will list several practical options that, in general, can be adopted for real projects.

Strimzi — Apache Kafka on Kubernetes

I will explain how Strimzi comes to the picture and what is newly offered from the perspective of features, comparing to Apache Kafka.

In a nutshell, Strimzi facilitates the task of moving your “Hello world” system to the Kubernetes platform (or Openshift — an enterprise version of Kubernetes). Without Strimzi, without any prior hands-on experience with Kubernetes, you need to do another “Hello world” exercise with Kubernetes even though I don’t think such an exercise is enough. Then without Strimzi, with some experience with Kubernetes, you start to do it by yourself, fix issues if any. In my case, I usually give up at this step after issues keep coming.

Assuming that you already have an overview of some concepts in Kubernetes, you can imagine how to run an application on a Kubernetes “machine” (pod, container, btw) and can do that by yourself but not that smoothly just because you don’t do it every day. Then Strimzi guys come to the picture by providing you some scripts to do that. Those scripts ensure that you don’t have any silly issues, like not installing the correct Java version, missing some configurations in zookeeper.properties or server.properties. What you have to do is just to copy and paste the provided commands on Strimzi’s “Getting Started” page and enjoy your “Hello world” Kafka on your local Kubernetes.

IMHO, the new thing in Strimzi comparing to existing Kafka concepts is the Kafka bridge as you can see in the following figure.

image src :https://strimzi.io/docs/operators/latest/images/overview/kafka-concepts-supporting-components.png

I don’t know which protocol is used to publish/subscribe a message to/from the Kafka cluster (maybe TCP?). But I am pretty sure it is not RESTful. For microservice architecture in which micro-services typically communicate via REST APIs, Kafka bridge seems to be the missing piece of the puzzle. Especially, on Kubernetes, exposing a service via any protocols other than HTTP/HTTPS is not always a preferable choice. And if you are not the guy who is responsible for the Kafka cluster in your team and working on some apps using Kafka, instead of using some libraries to publish/subscribe messages to/from Kafka, you only call the APIs exposed by Kafka bridge to do that. Isn’t it worth trying? I think yes.

Confluent’s Kafka

I mention Confluent because I recently see its ads on my Facebook page and read their well-written book about Event-Driven Architecture. In fact, there are actually many solutions that provide you a cloud-native deployment of Kafka. Even you can do it with your local Kubernetes, and name it as whatever you want. Typically, you can find all the basic Kafka features from any solution. The difference between those solutions lies in their essential business model in terms of support, maturity, feature sets which is not the focus of this post.

Let me give you a brief description, then if you are interested, you can go to the Confluent documentation website to find more about it. In the context of Kafka and cloud, you might focus on the bottom part of the Confluent component architecture figure that I outlined.

image src : https://docs.confluent.io/_images/confluentPlatform.png

Basically, Confluent company provides you a platform. This platform is made to facilitate your task of deploying itself on any cloud platform, or on Confluent cloud. Its (the platform) core is Apache Kafka, meaning that it has other atop features that wrap Kafka functions and provide APIs or whatever (or interfaces, in general) to different applications. The interfaces provide a facility, up to some extent, for the developers of those applications to use, which can be, say, understandable syntax, simple input parameters, efficiently default configuration settings, and so on. And think of it from a theoretical perspective, as a backup solution, because Apache Kafka is at its core, in the worst worst worst case, if those external applications require something not available at the moment, there can be a workaround to enable the direct access to their core Kafka for the applications.

At this time, some noticeable features/concepts from my side include Confluent operator that leverages Kubernetes Operator pattern for stateful deployments of Kafka as backend services, Confluent REST proxy. The others might come when I have a chance to play with it.

BanzaiCloud’s Kafka

I don’t have a chance to play with the Banzai Cloud Supertubes tool due to its high resource requirement. From its overall architecture, I think the idea is the same as that of Confluent, that is, setting up a system with Kafka at the core, deploying several components that are installed and managed by Kubernetes-based operators, enabling the integration to popular systems like Prometheus for monitoring, Grafana for reporting. Such integration increases the success of adopting Banzai architecture to the existing infrastructure of a team.

image src: https://banzaicloud.com/docs/supertubes/supertubes-arch.png

Google Cloud Pubsub

You are not willing to know Kubernetes, or like me, just thought of it as something that can allow to CRUD container, and just want to use Kafka without any concern about scalability, robustness, maintenance, blah blah, then you might take a glance at GCP. I mention GCP because I see that its offered interface is quite similar to what is provided by Kafka. However, GCP also provides a Push mechanism via REST APIs for the Subscription feature as well as a batch processing capability. I don’t know whether or not Kafka is used at the backend.

From the source code of their Python library, I see that any request is sent to the GCP backend using the gRPC protocol.

How about you, what system are you using or aware of? I really appreciate it if you can leave a comment so that I can take a look.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store