What keeps Apache Kafka from eating the world?

There’s one truly compelling thing about Kafka. It goes beyond the use cases for which it’s famous. And it’s highly unlikely this point was a design goal when it was open-sourced seven years ago. Here’s what’s emerged: Kafka is nirvana for distributed systems.

Nirvana is a heady word, and while we mean to invoke its ethereal sense (not Teen Spirit), this is a real, actionable position, which some fortunate enterprises already wield. Very often, what we’re talking about is materialized in one of those long team whiteboarding sessions where someone snaps a picture of the diagram, Kafka at center.

In a moment, we’ll talk about while it’s often hard to render those marker strokes into reality.

Whether labeled accordingly or not, the architecture on that whiteboard is the “central nervous system” Jay Kreps touts. Kafka organizes, unifies and conveys distributed information. And when Kafka’s purview is most or all of that information, it centralizes and commands it in a way familiar to those of us with a brain.

Confluent views the CNS as the zenith of the Kafka journey

But too often we overlook a critical detail, here. The CNS is an amazing conception for the mechanisms of the system, yet we can’t lose sight of the information—the data—that imbues it with value. Distributed systems nirvana also means a single, shared source of truth. And truth is founded on much more than current events; it’s about memory.

Kafka’s promise is immense enough to describe it with cerebral language. So why have relatively few organizations fully capitalized?

A temporal tradeoff

Kafka’s semantics operate upon immutable event streams of indefinite length. In turn, Kafka’s implementation effectively provides these topics with conduit, where the event records are stored. Theoretically, the bounds of conduit are indefinite like the data that flows through. In practice, the operational traits of Kafka yield some natural tradeoffs related to event storage duration. Because Kafka is generally fantastic, and because we’re conditioned to accept technical tradeoffs, it’s super easy to couch this as a minor constraint, particularly for new adopters.

Here are the workarounds:

  • Forget old data
  • Beef up hardware
  • Ship older data out of Kafka

But this isn’t a mere speed bump. This can be where the design on the whiteboard falls apart. It is the stressor which stands most stubbornly in front of an organization’s ability to realize Kafka’s nirvana.

It’s critical to call it what it is.

Photo by Tyler Milligan on Unsplash

The memory cliff

Streaming data’s advantages vanish when we remove log-oriented semantics. The memory cliff is the point where we settle to some degree for Kafka’s implementation constraints. This term casts the workarounds more starkly:

  • Forget old data / how much future value are we tossing away?
  • Beef up hardware / how much are we willing to pay?
  • Ship older data out of Kafka / how much complexity are we willing to accept by bifurcating the source of truth across place and access pattern?

Importantly, there is also a temporal coupling associated with any trade off we accept. Can we accurately predict how future changes to business requirements will affect the impact of the present choice? Can we discount for opportunity lost when we decide to forgo Kafka’s advantages for some portion of our data?

Speaking to our aspiration toward nirvana: To the extent that an organization values truth as a combination of present and past information, how much history is our business willing to part with? What amount of work is reasonable to dredge it back up?

Or what if we never had this trade off in the first place? What could make it possible to eliminate the memory cliff?