What keeps Apache Kafka from eating the world?

Colin Hicks
May 4, 2018 · 3 min read

There’s one truly compelling thing about Kafka. It goes beyond the use cases for which it’s famous. And it’s highly unlikely this point was a design goal when it was open-sourced seven years ago. Here’s what’s emerged: Kafka is nirvana for distributed systems.

Nirvana is a heady word, and while we mean to invoke its ethereal sense (not Teen Spirit), this is a real, actionable position, which some fortunate enterprises already wield. Very often, what we’re talking about is materialized in one of those long team whiteboarding sessions where someone snaps a picture of the diagram, Kafka at center.

In a moment, we’ll talk about while it’s often hard to render those marker strokes into reality.

Whether labeled accordingly or not, the architecture on that whiteboard is the “central nervous system” Jay Kreps touts. Kafka organizes, unifies and conveys distributed information. And when Kafka’s purview is most or all of that information, it centralizes and commands it in a way familiar to those of us with a brain.

Image for post
Image for post
Confluent views the CNS as the zenith of the Kafka journey

But too often we overlook a critical detail, here. The CNS is an amazing conception for the mechanisms of the system, yet we can’t lose sight of the information—the data—that imbues it with value. Distributed systems nirvana also means a single, shared source of truth. And truth is founded on much more than current events; it’s about memory.

Kafka’s promise is immense enough to describe it with cerebral language. So why have relatively few organizations fully capitalized?


A temporal tradeoff

Kafka’s semantics operate upon immutable event streams of indefinite length. In turn, Kafka’s implementation effectively provides these topics with conduit, where the event records are stored. Theoretically, the bounds of conduit are indefinite like the data that flows through. In practice, the operational traits of Kafka yield some natural tradeoffs related to event storage duration. Because Kafka is generally fantastic, and because we’re conditioned to accept technical tradeoffs, it’s super easy to couch this as a minor constraint, particularly for new adopters.

Here are the workarounds:

  • Forget old data

But this isn’t a mere speed bump. This can be where the design on the whiteboard falls apart. It is the stressor which stands most stubbornly in front of an organization’s ability to realize Kafka’s nirvana.

It’s critical to call it what it is.

Image for post
Image for post
Photo by Tyler Milligan on Unsplash

The memory cliff

Streaming data’s advantages vanish when we remove log-oriented semantics. The memory cliff is the point where we settle to some degree for Kafka’s implementation constraints. This term casts the workarounds more starkly:

  • Forget old data / how much future value are we tossing away?

Importantly, there is also a temporal coupling associated with any trade off we accept. Can we accurately predict how future changes to business requirements will affect the impact of the present choice? Can we discount for opportunity lost when we decide to forgo Kafka’s advantages for some portion of our data?

Speaking to our aspiration toward nirvana: To the extent that an organization values truth as a combination of present and past information, how much history is our business willing to part with? What amount of work is reasonable to dredge it back up?

Or what if we never had this trade off in the first place? What could make it possible to eliminate the memory cliff?

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store