In the world of event-driven applications, one key thing is to make sure that the messages that get published to a topic or a queue can be understood by all the players, i.e. the producers and consumers.
This means that having a data structure or schema is highly recommended.
The benefits of having a defined data schema in your event-driven ecosystem are clear data structure, type, and meaning and more efficient data encoding.
In this tutorial, we are going to add Schema Registry to our Kafka environment, which was built as part of this tutorial.
It is highly recommended that you follow that tutorial before commencing with this one.
Why Schema Registry?
We all understand that our Kafka producers publish messages to Kafka topics and our Kafka consumers read the messages from those topics. Schema Registry acts as a third person that ensures the messages published by the producers can be read by the consumers.
Over time, our schemas may evolve to include more fields or fewer, or data type changes for certain fields. Schema Registry helps with the compatibility check, so that the consumers can still read the messages. It ensures that the producer-consumer contract stays valid.
Add Schema Registry and Control Center Images to docker-compose File
We are going to add the Docker images for Schema Registry and Control Center to our existing
The images that will be added are as follows:
The Control Center is awesome. It provides a simple UI for us to create a topic, see messages within the topic, view schema for the topic, and many more.
Anyways, let’s go ahead and add the images to the compose file.
Spin Up Docker Containers
Go to your terminal and go to the directory that contains the
docker-compose.yml file. Let’s start up all the containers so we can play around with them.
If we see the output as above, we’re good. If not, feel free to post any errors or logs in the comments below and we can debug and discuss the solution together.
Create a Topic
In the previous tutorial, mentioned in the introduction, we created the topic via the Kafka CLI tool. But now, we are going to use the Confluent Control Center to create the topic as it is much easier and quicker.
Go to your browser and navigate to http://localhost:9021. You should see the Control Center UI. Then, navigate to Cluster 1 on the left-hand side and go to Topics. See screenshot below.
Now, go ahead and click on Create topic. You should see the below screen. Let’s create a topic named
to-do-list and use the default settings. Go ahead and click on Create with defaults.
Awesome. We can see that our topic was created. Notice that on the topic’s view, we have Messages and Schema tabs.
These tabs allow us to view the messages that are in the topic and the Avro schema of the message’s key and value. Great stuff, right?
Define the Topic Schema
Let’s start by defining our message’s key schema. Generally, we want our key to be unique to benefit the most from Kafka’s topic partition. Remember that the Leader will place the message in a partition based on the message’s key.
If the key of multiple messages is of the same value, those messages will go in the same partition. We want this behavior when the order of the processing matters, otherwise, we want the messages to be distributed evenly across the available number of partitions.
to-do-list topic, it does not care about the order as the items in the list need to be done regardless. In other words, there’s no priority, per se.
Taking this assumption into account, we will use
UUID as the message’s key. This means, the Avro schema of the message’s key will be of type string.
The Confluent Control Center provides us with the ability to set a schema directly on the UI. So, let’s make use of it. Now, go ahead and navigate to the Schema tab followed by the Key tab as shown in the picture below.
As discussed earlier, we will set the key schema to simply be
"string". Go ahead and Save it.
That’s it! We have just set the schema for our message’s key. Pretty easy, huh?
Now, we will define the schema for the message’s value. This one is more interesting because we will see the beauty of the Avro schema.
Before we do anything, let’s discuss what should be in the messages contained in the
to-do-list topic. What would be useful to know from an item in a to-do list?
Here are a few things:
- The task itself.
- The name of the assignee or who is responsible for getting it done.
- The estimated duration to complete the task.
- The timestamp at which the task was added to the to-do list.
Looks pretty good, doesn’t it? Well, let’s start there.
In Avro schema, this is how the above data would be represented.
assignee are going to be
string type, whereas the duration will be in seconds, hence, the
int type. Lastly, the
creationTimestamp will be in milliseconds and that’s why it’s the
Note that adding the
creationTimestamp to a Kafka message can be useful, especially when we want to do some logic around time, e.g. tasks that have been sitting around for a while should probably be done first. It is also critical when our consumer is interested in the order of the messages based on timestamp.
Anyways, let’s continue and set the schema for our message’s value. On the same Schema tab, click on the Value tab.
Go ahead and click on Set a schema. Copy the above Avro schema and paste it to the editor and, finally, click Save. You should see that your Avro schema has been registered.
We now have a full Kafka environment set up and ready to go for our applications to use in an event-driven style.
By the time we reach this section, we will have gained a basic understanding of how Kafka and Schema Registry work hand-in-hand. We also have seen how convenient it is to use the Confluent Control Center to manage our Kafka and Schema Registry.
Now that we have a full Kafka environment ready, we can write some producer code and publish a message based on the schema we defined above. We can also write the consumer code to read the messages. There are many Kafka client libraries that you can choose from as listed in the Confluent docs.
I encourage you to pick the Kafka client library based on the programming language you are most familiar with. This way, you can focus on learning how the API works and are able to try it out straight away.
For a tutorial on how to set up a local Kafka Docker container, go to this article.