Kafka KSQL is NOT SQL. Here’s a better way to achieve Kafka Analytics

Mark Palmer
Techno Sapien

--

How to democratize access to real-time data with Kafka for business users

Recently, this video about Kafka analytics drew a dramatic response ranging from accusations of spreading FUD (fear, uncertainty, and doubt) about KSQL to great interest from business analysts that see a need for something more. I found a voice of reason in Jesse Anderson in Why I recommend my clients NOT use KSQL and Kafka Streams, where he writes:

“Kafka isn’t a database. It is a great messaging system, but saying it is a database is a gross overstatement.”

Jesse does a nice job explaining why Kafka isn’t a database and KSQL’s strengths and weaknesses. But why is this misunderstanding so dangerous? Who cares if developers conflate Confluent KSQL and database concepts?

Here’s the problem: as the use of Kafka grows, the business, increasingly, can’t see what’s going on. A new way of thinking, and a solution that enhances Kafka, is needed.

Full disclosure: I lead analytics development at TIBCO, so I have a vested interest in this argument. I believe our solutions advance the larger Kafka cause for good, and i believe that Spotfire is the first and only such solution available. That said, I expect continued criticism from Confluent under the banner of an open source purity test that they can not pass. And, I believe Confluent simply does not see this issue from business user’s point of view.

The problem: how can we to achieve real-time Kafka analytics?

Let’s start with the business problem: millions of messages flow on Kafka every day: customer orders, connected vehicle location updates, drone imagery, social media messages, and more. Business users have no easy way to understand what’s in these messages — they’re flying blind in real-time.

Many Kafka developers try to “address” this problem by storing Kafka messages in a database for exploration with traditional business intelligence tools. But by the time these messages land in a database, they’ve been digested, summarized, ETL’d, batched, federated, augmented, tuned, and tweaked. Not only does this processing take minutes, hours, days, weeks or even months, but after the digestion, the resulting data hardly resembles what happened in the first place.

For some applications, that’s okay. But for operational applications, where it’s essential to act in the moment, this architecture is fundamentally broken. We call it the “too late architecture” because by the time you figure out what really happened, it’s too late. It’s like looking in your rear-view mirror to drive your car. That is, who cares if you find out tomorrow what you could have done today to make a sale or stop a security breach? It’s too late.

Confluent renamed KSQL to Confluent ksqlDB, which further confuses the issue. ksqlDB provides an API that makes streams look like a table — but the confusion still stands: Kafka is NOT a database, and APIs don’t help business users; indeed, they only deepen the stranglehold Confluent has on their customers, and fail to provide visibility.

A new and better way: self-service analytics for Kafka

A better solution helps provide real-time visibility into Kafka events to business users in minutes. You simply connect Kafka topics to the a new kind of in-memory database engine that we call TIBCO Data Streams, open a table in Spotfire and go.

To business users, Kafka topics look like a database, but updates are continuously live: when any data on Kafka changes the result set in the query, the new data is pushed to the business users, and visualizations are continually kept up-to-date.

And Spotfire continuously compares current data with history. For example, in the dashboard below, data streaming from a Formula One car is displayed on a course map at top left. The current speed, brake temperature, lap pace and distance traveled is compared to the best historical performance. A race analyst can constantly see how the now compares to the past, and make adjustment accordingly.

Real-time data compared with historical data: red means we’re behind our best lap, green means we’re ahead.

The Spotfire client was redesigned several years ago to accept push-based updates from the Data Streams engine. There’s no evidence that any other BI tool available can perform such a feat, so although new ways to gain insight are coming, they aren’t yet common.

The democratization of Kafka

Why has streaming business intelligence drawn such ire? I chalk it up to different points of view. To developers, KSQL or ksqlDB and a database is a simple solution; to business users, it’s not good enough.

Then there’s the commercial issue. I’ve met CIOs that pay over $15 million a year “open source” support. Yet even Kafka fans admit that they prefer to just keep building custom dashboards. But that’s not necessary — there are great analtyics tools that can do the job. They just needed a direct connection to Kafka, and now there is one.

Streaming business intelligence delivers a solution that business users need at a fraction of the cost and it empowers business users instead of shutting them out. We hope the Kafka community celebrates the democratization of Kafka analytics because it helps make Kafka’s impact bigger and better. And indeed, that rising tide should lift all boats.

For more, watch this video on Kafka Analytics or visit www.tibco.com/kafka to explore TIBCO’s commercial offerings. Or, if you’re a business person, watch this on the use cases for Kafka analytics. For information about related technologies and innovation, read Why you should learn about streaming data science and How to query the future.

--

--

Mark Palmer
Techno Sapien

Board Advisor for Correlation One, Data Visualization Society, and Talkmap | World Economic Forum Tech Pioneer | Data Science for All Mentor