Kafka is a very powerful piece of technology. It can provide incredible throughput and has become the standard way to communicate between backend services.
The only real negative is that getting insight about what is happening inside of Kafka can be a bit of a pain. Sure, Kafka ships with some scripts to manage and view details about topics but they feel a bit clunky sometimes… They’re also capable of creating and destroying the entire universe which is scary and can make them risky to use.
I highly recommend installing kafkacat with yajl (for homebrew, run
brew install kafkacat --with-yajl). If you include yajl you’ll have a
-J flag for any output-producing kafkacat command that stuffs everything nicely into a JSON envelope. Our examples will assume you don’t have this option… but it gives you some more cool stuff to play with.
I’m going to layout a few quick scenarios that I use kafkacat for almost every day:
- I want to see what topics exist in my Kafka cluster.
- I’ve written the world’s greatest consumer of a Kafka topic but I want to run some integration tests on very specific messages.
- I’ve written the world’s greatest Kafka publisher and I want to verify that my messages are going to the correct topic and partition.
- I want to copy/steal a bunch of messages from one Kafka topic (maybe on another cluster) and publish them to another Kafka topic.
- I want to view very specific messages in a given Kafka topic.
Getting a list of topics with kafkacat
One of the first hurdles of working with Kafka is just knowing what topics are available in the first place. This is super simple with kafkacat.
kafkacat -L -b kafka
Thats all it takes. The arguments are
-L for listing mode and
-b for which Kafka broker to talk to. You’ll get a list of every single topic on the cluster. It looks something like this:
Publishing messages with kafkacat
This is absolutely my most common task when working with kafkacat. I’ve written a consumer that is thirsty for data but the data doesn’t exist. I depend on another application to provide me that data but I don’t want to run that app just to test drive my own. Fret not, kafkacat can help. Run the following in your terminal of choice:
kafkacat -P -b kafka -t awesome-topic
Thats it! Lets break it down.
-P puts kafkacat into producer mode,
-b points to a Kafka broker called
-t points us to the
awesome-topic topic. Now you’re publishing messages! Don’t get confused by the cursor hanging out below the command you issued… just type (or paste) your message into the terminal and hit enter. Rinse and repeat. Once you’re done just hit
control+C and send it as a message. Your terminal should look like this:
Consuming messages with kafkacat
Lets say you have an app that publishes a message but does not consume that same message. You want to make sure everything lands exactly where you expect it to. You need to consume a topic in order to ensure everything is in order.
Luckily, consuming messages with kafkacat is extremely simple. Type the following in your terminal:
kafkacat -C -b kafka -t awesome-topic
Literally the only difference between a basic publisher and a basic consumer is the
-C option. This tells kafkacat to consume messages. Obviously as a consumer you can’t give any input… but you’ll see a whole bunch of fun output. In this case, you’ll see the messages we published in the first example.
By default kafkacat will tell you when it reaches the end of a partition. I have Kafka configured to create topics with 10 partitions (0–9) so we see a lot of extra information with our messages. If we don’t care about the partition/offset information we can use the
-q option to only show the messages. That looks like this:
Looking at our first example we can see that the partitions don’t report in any particular order and the messages are just kind of floating in space. If I cared what partition my message was coming from I can’t really tell… it would be great to be able to correlate the partition and offset of a message when we use kafkacat. Well thats also pretty simple. Kafkacat supports custom formatting with
-f and some tokens. If you just type
kafkacat -f you’ll get some usage info… including the following formatting standards:
Format string tokens:
%s Message payload
%S Message payload length (or -1 for NULL)
%R Message payload length (or -1 for NULL) serialized
as a binary big endian 32-bit signed integer
%k Message key
%K Message key length (or -1 for NULL)
%o Message offset
\n \r \t Newlines, tab
\xXX \xNNN Any ASCII character
-f 'Topic %t [%p] at offset %o: key %k: %s\n'
So lets do it:
Transferring messages between topics or clusters
Lets say you want to move some messages from one topic to another without copy/pasting them between consumer and publisher windows. Another VERY easy thing to do with kafkacat.
kafkacat -C -b kafka -t awesome-topic -e | kafkacat -P -b kafka -t awesome-topic2
So what does this do? Lets break it down. There are two commands separated by a
|. Our first command is a basic kafkacat consumer on the
awesome-topic topic. The
-e flag is new for us but just means “terminate when you get to the end of the topic.” The second command just reads from stdin and publishes messages to
awesome-topic2. Thats it!
If we want to do this with a file in the middle it would look like this:
kafkacat -C -b kafka -t awesome-topic -e > awesome-messages.txtcat awesome-messages.txt | kafkacat -P -b kafka -t awesome-topic2
All we have to do is pipe the messages into a file. Now anytime we want to dump those messages into Kafka we just pipe the file contents into kafkacat.
Wanna transfer messages from one cluster to another? Try this:
kafkacat -C -b kafka2 -t awesome-topic -e | kafkacat -P -b kafka -t awesome-topic
Notice our consumer is connecting the broker
kakfa2 and our producer is connecting to
kafka. As long as we can connect to the broker we can use kafkacat to move messages.
Get the messages you want
For these examples lets seed a new topic with some values. Run the following:
seq 1 100 | kafkacat -P -b kafka -t superduper-topic
Thats gonna throw the numbers 1–100 into
superduper-topic for us.
Sometimes we just want to see a sample of the messages. Maybe the last 10 that were written or the last 10 messages written to a particular offset… we can do both of those:
kafkacat -C -b kafka -t superduper-topic -o -5 -e
This command uses the
-o flag which means “read from this offset” and when we feed it
-5 it means “read 5 messages from the end” (sending
5 would start from offset 5).
-e exits when the last message is read.
You’ll get the last 5 messages across all of your partitions. I have 10 partitions so I get 50 messages.
If we wanted to focus on one partition…. we could do this:
kafkacat -C -b kafka -t superduper-topic -o -5 -e -p 5
This command behaves just like the one above it but the
-p 5 tells kafkacat to only read messages from partition 5.
You can do a lot of stuff with kafkacat. There is tons more than I’ve covered here… you can specify a message key for use in the partitioning strategy… the
-J flag is awesome (just throw it on the end of anything you want and then start piping stuff to
jq)… you can join or create consumer-groups…
The possibilities are almost limitless. It’s an incredibly useful tool so make the most of it!