Contrasting Kafka with Akka
On the similarities and differences between two industry heavyweights
My software architecture consulting gigs take me places ranging from startups to more established organisations, having one thing in common: they are looking to solve a problem using a distributed systems approach. Organisations are quick to embrace the benefits of microservices and event-driven architecture—scalability, supporting business change, polyglot culture, deployment flexibility—values that organisations are increasingly looking to adopt.
These architectural paradigms aren’t without their drawbacks too, however. Distributed systems are trickier to debug. Consistency and distributed state don’t play well. Concurrency is a lot more difficult to muster. To the latter, the use of actor systems and related forms of command serialization is becoming more prominent in a contemporary architecture landscape.
Perhaps unsurprisingly then, I often get inundated with questions about Kafka and Akka—how the two relate and whether we should be using one over another—almost as if to say they were in some way interchangeable. I felt it’s time to dispel some misconceptions and objectively delineate their commonalities and differences.
TL;DR: Akka and Kafka belong in two distinct technology camps that have only circumstantial similarities. Kafka is an event streaming platform, loosely residing in the Message-Oriented Middleware (MoM) space. Akka is an Actor Model — a mechanism for concurrent computation based on the concepts of agents, immutability, and message passing. The similarities are in the message part—both technologies use messaging as an operative concept.
Note 1: To clear the air, and to preempt my unreserved Kafka apologist friends, there is no suggestion that the uses of Kafka are in any way limited to messaging; however, messaging is one of its prominent use cases. Kafka is an event streaming platform, and an event record is a type of message—speaking from a purely taxonomical viewpoint.
Note 2: This post generally applies to any actor system and any event streaming platform. Feel free to substitute Kafka with one of its alternatives, such as Pulsar or NATS Streaming. Equivalently, you may substitute Akka with Orleans or Indigo. Minor idiosyncrasies aside, the key concepts are largely the same.
From an architectural standpoint, Kafka is a low-level building block that assists in the shipping and retention of events within a distributed application landscape. It embodies the dumb-pipes, smart endpoints metaphor, essentially acting as a highly-scalable persistent transport, which enables parallelism at the consumer level through the use of partially-ordered topics, comprising totally-ordered partitions. Kafka is not a programming model, and is completely application and programming language agnostic. In other words, you can implement Kafka producers and consumers in a variety of languages, providing you can obtain a client library written in that language. (And most languages do have a Kafka library.) The behavioural contracts between producers and consumers in the Kafka world are very rudimentary: producers deposit records onto an append-only topic, while consumers read records in a non-destructive manner that resembles a cursor in database parlance. Crucially, consuming a message does not delete it from the topic.
Akka is a high-level application framework that obviates the need for traditional lock-based concurrency control primitives by employing the concept of actors — active entities that may be either durable or ephemeral, depending on the application context. Under the hood, an actor model maintains a set of inboxes (inbound message queues) and a scheduler for efficiently multiplexing a relatively small number of OS-level threads onto a much larger (possibly in the order millions) number of actors. Actors may form a range of topologies (hierarchical, pipeline or ad hoc) and communicate by passing messages. Although multiple actors may be operating concurrently, any given actor may only operate on one message at any given time. And it will remove the message when it has finished processing it. Providing an actor has sole ownership of some contentious resource, an actor model like Akka eliminates the need for exclusive and shared locks, by ensuring that only one process can operate (read from or write to) any given resource. Furthermore, an actor model is bound to a specific programming language.
¿Por qué no los dos?
Granted, the similarities between Kafka and Akka are superficial. However, Kafka may be used as the underlying transport for a distributed actor model. Generally, one does not cross-shop between the two; they are not exclusive — both may be used effectively within an application for different roles.
- Both employ messages as an operating concept; although in Kafka, they are called ‘records’.
- In both technologies, messages are ordered. Additionally, Kafka supports a further level of sharding where messages in a topic may follow a partial order. I.e., some records may appear mutually ordered, while others may be arbitrarily ordered.
- Kafka records are generally consumer-agnostic, in that they don’t target specific consumers, largely decoupling the producer and consumer ecosystems. Akka messages tend to target specific actors; however, they may also follow a broadcast topology.
- Kafka records are not removed when consumed. An Akka message will be removed after an actor has dealt with it.
- Kafka is designed for use in distributed systems. Akka operates within the confines of a single process. It may be applied in a distributed context when using a cluster; this effectively aggregates multiple actor systems into a single actor address space.