Building a Kafka that doesn’t depend on ZooKeeper
Can you build a Kafka that doesn’t depend on ZooKeeper? This was the question that led me to developing a Kafka implemented in Golang, called Jocko.
Recently, I used Serf and Raft to build in service discovery and consensus to remove the dependency on ZooKeeper, its burden on users, and keep the same guarantees. My takeaways from the project were:
- What Kafka uses ZooKeeper for
- Advantages of built-in service discovery and consensus
- Serf and Raft are awesome tools for building distributed services
- How to build-in service discovery and consensus into a distributed service using Serf and Raft
I’ve written about these and hope that people who use Kafka, are interested in Jocko, or want to build distributed services find this essay useful.
In this essay I mention “build in” and by that I mean making the feature apart of Jocko rather than needing another service to do it. When I discuss Raft and Serf, I’m talking more about the implementations rather than protocols.
What does Kafka use ZooKeeper for? Kafka uses ZooKeeper (ZK) for service discovery — the process of learning the location of services, and consensus — the process of getting several nodes to agree on something. ZK’s commonly used by projects in the Hadoop ecosystem — like Kafka.
In a Kafka cluster, service discovery helps the brokers find each other and know who’s in the cluster; and consensus helps the brokers elect a cluster controller, know what partitions exist, where they are, if they’re a leader of a partition or if they’re a follower and need to replicate, and so on.
ZooKeeper was one of the very options for coordination on its release in 2007. Today, distributed service makers have better options.
First, services like Consul (etcd too) who tend to differentiate themselves from ZooKeeper by:
- Ease setup and deployment. Binary installs, less config (probably just need to pass args to the CLI).
- Productive APIs that are specific and high-level. With ZK, you work with directories and files. You build service discovery, cluster membership, leader election on its primitive K/V store. Kafka has 2700+ lines of code that are ZooKeeper related plus another 700+ for docs. (I got this number by grepping (zookeeper|zk) in its repo, so that doesn’t even include lines that are ZK specific but just don’t have those strings in them.) Consul provides a high level, opinionated framework for these features and removes the time spent learning and developing your own system.
- Improved tech under the hood. Consul’s built on Serf — a lightweight and efficient decentralized tool providing service discovery, cluster membership, failure detection, and user events. Serf uses a gossip protocol to distribute health checks and failure detection that scales to clusters of any size. ZK needs to do work linear to the number of nodes, failure detections are at least as long as the timeout, and clients need to be “thick” — as they maintain active connections to ZK and are difficult to write and often suffer from bugs.
Now, so far I’ve compared ZK to Consul and talked about the latter’s benefits. Two libraries, Serf and Raft, form Consul’s heart and can be used standalone to solve the same problems. And as a distributed service maker, if you use them to build in coordination, you remove the burden on your users to setup a dependent service. Leading to the next option.
Second, libraries like Serf and Raft — used by Nomad, InfluxDB, Consul, etcd, rqlite, and so on. Serf’s a tool for service discovery and Raft’s a tool for consensus and leader election. And since they cause much of what makes Consul good, the points above could be listed here as well — solid APIs and tech and so on. I want to focus on what unique benefits using these libraries in your services have.
- Remove burden of dependent services from users. Since the service builds in coordination with the libraries, users don’t need to setup ZK (or a similar service) anymore.
- Piggyback on their work with events. Serf and Raft expose events that you can piggyback on for your own features. Serf has a channel of membership change events that Jocko uses to know what brokers are in the cluster. And Raft has a channel of leadership change events that Jocko piggybacks on to elect the cluster’s controller.
- A framework for cluster communication. Raft manages a replicated log and can be used with a finite-state-machine (FSM) to manage replicated state machines. So much in the way that Kafka — a log service — can be used as a messaging system enabling its consumers to run arbitrary actions on the logs, so can Raft.
This is the best case for distributed services I think, because it removes the biggest costs of: 1. the time (and money) it takes to run ZK, and 2. the effort it takes to build in coordination because you have good tools now.
How Jocko builds on Serf and Raft to coordinate its cluster. Jocko broker instances are backed by Serf and Raft instances (I call them “the serf” or “ the raft” here on) created automatically on boot. A Jocko cluster is then made up of 3 groups, each group has their own jobs and builds on what’s below it: 1. Serf members, 2. Raft peers, 3. Jocko brokers.
- Serf members. The 1st broker you boot up is the 1st serf member in the cluster. For subsequent brokers you want to join in, you pass a subset of current members to contact, they gossip, and serf soon discovers every member in the cluster. On setup, serf tags metadata of the broker’s ID, its address, and ports that its raft and API are listening on and shares these tags with the other members. That’s how the brokers know each other’s addresses, IDs, and so on. After a broker’s serf hears the gossip and adds the member, it makes an event available for the rest of its system saying that it’s done so.
- Raft peers. Jocko listens for serf’s event saying it found a member and then tells raft to add that member as a peer. If this is the 1st broker you’ve booted up, it starts as the leader and remains so until the cluster goes through a leader election. Otherwise, the raft becomes a follower, replicates logs from the leader, and applies them with its FSM. Raft exposes leadership changes with an event channel.
- Jocko brokers. Jocko uses serf’s and raft’s events to know what brokers are in the cluster and who’s the controller. Then the controller can assign partitions to the brokers. All this cluster metadata — the brokers, the partition assignment, and so on — are state on each broker and shared with clients via the Metadata API so they know the broker their request should go to, and also used by the broker to handle requests properly.
How Jocko handles state changing requests. Requests that change cluster state work like this—let’s say you sent a request to create a topic. That request would be handled by the controller of the cluster who’s the raft leader. The controller makes a log that looks like this:
add partition(
id, // this partition's ID
topic, // topic being created,
replicas, // brokers partition's replicas are assigned to,
leader // the broker that the leader replica is assigned to
)
— for each partition in the topic, and raft would replicate and apply that log throughout the cluster. The brokers would handle that log differently based on whether they were assigned a replica — if not, they do nothing other than update their metadata; if yes and they’re not the leader, they start replicating from the leader; and if yes and they’re the leader, they start handling fetch and produce requests.
All requests that change state that need consensus are handled similarly. If you send a cluster-writing request to a broker that isn’t the controller, the request is forwarded to the controller (Kafka doesn’t support this yet).
Requests like fetch and produce — that are broker specific and are cluster state readonly — work the same as Kafka. First, the client uses the Metadata API to get information on the partition it’s interested in — its assigned broker, its latest offset, and so on. Then it sends the protocol-encoded request (fetch, for instance) to the broker, it does work, and responds back. How the storage internals work’s the interesting stuff.
In closing. Building in service discovery and consensus is awesome because it saves users time and money. You’d think that’d create more work for the person making the service than using a dedicated service, but Serf and Raft can make it less work. I haven’t done the FSM justice for how useful it is, I’d like to dig into that in a future essay.
Questions or feedback? Find me at @travisjeffery on Twitter.
If you liked this, click the 💚 below so other people will see this here on Medium. Thanks!
Further reading:
- Consul, Nomad — Hashicorp projects that use Serf and Raft, reading their code was a huge help.
- Serf vs. ZooKeeper, doozerd, etcd — Hashicorp’s comparison of Serf to ZK and others.
- Distributed Consensus Reloaded: Apache ZooKeeper and Replication in Apache Kafka — thorough treatment by the creator of ZK on how consensus and replication works.
- Serf Internals — goes over some of the internals of Serf, such as the gossip protocol, ordering of messages via Lamport clocks, etc.
- Curator — Apache project that tries to provide the common stuff you use ZK for, but it doesn’t do.