Kafka & Confluent at the Edge: Interview with Kai Waehner

Dasha Korotkykh
Hivecell
Published in
9 min readMar 22, 2021

We talked with Kai Waehner, Field CTO and Kafka Evangelist at Confluent, to go a little bit deeper into the understanding of edge computing, and how Kafka and Confluent contribute to edge computing growth. Hello and let’s roll!

There is one question we’ve always wanted to ask. Kai, you are well known for being an enthusiastic evangelist of Kafka and Confluent.
How, and when, and why did you become one? How did it happen?

KW: Actually, a long time ago. Around 12 years ago, after university, I started with data processing integration — so this is really my business life. I worked first for some open-source vendors like Talend, and then I moved to TIBCO, which was more proprietary but also involved messaging, integration, stream processing, and custom connectors. At that time, many people didn’t know Kafka, but I really liked it already.

It was pretty funny because that was a time when nobody knew Confluent either, and when I got on an interview with Confluent’s CEO, Jay Kreps, he was doing the first interview with some tech people to convince them that Kafka is great. I learned that Kafka is much more than just messaging. I also recognized that I want to go back to open-source.

So you pursued a combination of a tech novelty and an open-source project to give back to the community, right?

KW: Yes, exactly. Today we see that things are getting more and more open, both within the edge and industrial IoT. And this doesn’t mean everything is free — that’s a common misunderstanding. The core is free to run in production and that’s totally OK.

We typically compare Apache Kafka to the motor engine: many customers really want to have the full, completely preconfigured car, including all security and operation tools. Hence, they buy Confluent. Or for example, having Kafka on a Hivecell hardware, you can just ship to a retail store or factory already with the middleware on it. So there is an open foundation, but if you want to go into mission-critical workloads or have a serverless cloud offer, then a vendor like Confluent comes into play.

Can you share a bit on what’s unique about Confluent and how Confluent’s offering is best suited for edge applications?

KW: Sure. In summary, really, why people are coming to Confluent? Because we are doing only Kafka, but that’s what we are doing very well, from a tooling, expertise, and support perspective.

But I think even more important is building the full ecosystem, because Kafka alone is good, but it really gets interesting when you can actually take one architecture and leverage all the benefits on both sides. In the cloud you don’t even have to manage it, just use it. But on the edge, we give you the same experience and tools for that. We have built a Confluent operator which runs on Kubernetes so that you can run Kubernetes at the edge more or less out of the box. It takes most of the operations burden for you.

A great example is Hivecell:

you can have a Confluent platform installed with Kubernetes cluster on three Hivecell boxes, and then you just ship it to the customer and it runs out of the box — because Kubernetes also handles failover easily, and you can also do rolling upgrades.
This is a great advantage.

Can you remember when you heard about edge as a concept and what you thought about it?

KW: It’s really always important to define the term “edge”, because everybody has a different understanding of it. For me, “at the edge” means it’s outside the data center. And with that in mind, I’ve seen edge or similar terms a long time ago. But I think the real difference is now (and we will discuss this today, I think), that edge is just getting much smarter. We process much more data at the edge. What may be even more important, it’s also connected to the data center or cloud. And I think this is the key difference when we talk about edge today, compared to the edge of 10 and 20 years ago. When I look at customers, most of them are still in the age where the edge is defined as “not having any Internet at all”. And this is really just a start.

What about intelligent edge — is it any different from what you just described? And if so, what benefits does it bring?

KW: Well, I think “intelligent” is also one of these buzzwords which everybody defines differently. I have seen solutions that were defined as an AI edge solution, which actually only had a few business rules implemented, but they hide it under the AI term, right? Often it’s just marketing.

I think the intelligence at the edge is evolving. Just having a server collecting something at the edge and handling pre-filtering and pre-aggregations, sending only critical information to the cloud instead of replicating all of the data — that’s already intelligent. But if you go further, real machine learning and deep learning comes into play.

We see customers who train the model in the cloud with big datasets, but then they deploy the model itself to the edge to do things like predictive maintenance or quality assurance, embedding it directly into edge application for real-time scoring in milliseconds. That’s a real intelligent edge.

Is it correlating with intelligent IoT somehow? Do you see these concepts going hand in hand?

KW: Yes, because devices and machines also get more and more software. The best example is the car. If you’re talking about a Tesla, it’s typically not a car anymore. It’s just a few tires and software on it. I think this is a trend we see everywhere in the industrial IoT or consumer IoT: every kind of device gets smarter and smarter. A few years ago we only had smartphones. But today everything you buy is somehow connected. If it’s your watch or if it’s your headphones, they are all digitally connected and get more and more software. IoT is getting smarter and that intelligence is directly embedded into the edge because it’s expensive and doesn’t make sense to do all the analytics in the cloud. A lot of it has to be done at the edge, wherever the devices are.

So a smartwatch is a kind of a personal edge device?

KW: Yeah, it is!

Great metaphor. So what does Apache Kafka mean to the edge? How do you see the influence of Kafka and the growth of edge computing?

KW: In the last 10 years Kafka was adopted more and more for many different use cases in the data center and cloud, and also for the replication between both. But now people need to and want to deploy more use cases closer to the edge.

It’s important to understand Kafka is not just a messaging and storage system. It also has additional components like Kafka Connect for data integration, and also Kafka Streams, and ksqlDB for stream processing. In the cloud, it doesn’t matter that much if you combine five different products. Still, you have to work with five vendors and integrate them, but at least you can use it from a hardware perspective. At the edge, you can deploy all of this with one solution using Apache Kafka: messaging, storage, caching, integration, data processing. So you don’t need to combine five other solutions instead of using one. And that’s, I think, the key differentiator, why people use Kafka so much at the edge — instead of gluing together a lot of other solutions, where each one of them needs all the memory already.

What is your edge + Kafka success case study that comes to mind?

The deck of Royal Caribbean cruiser has dozens of data endpoints

KW: I can talk about one specific example, which is really also a great edge use case — Royal Caribbean.

It’s a cruise line, and when they do tours they need to do all the processing on the ship — and on a ship, Internet connectivity is very bad and very expensive. On the other side, these companies are very data-driven these days. Each customer has to download the app tracking things like coupons for the restaurant, a seat for the cinema, a restaurant reservation, and everything around it: upselling, cross-selling, point of sale transactions.
And all of that has to work without the Internet.

Therefore, Royal Caribbean is running Kafka on every ship, with the same back-pressure handling — because not everything is real-time. Sometimes the mobile app is disconnected and after it is online again it can upload updates and all the stats, and so on. All of the transactional data and analytics are right there on the ship.

But after three days when the ship gets back to the harbor, it has a good Internet connection for three hours. So they replicate all that happened on the ship into the cloud, to a corporate Kafka cluster that gets the data in from all the different cruisers. Then they can put their machine learning and data scientists on that dataset to find out what worked well on this cruise and what didn’t work well on the other: “how can we sell even better?”, or “how can we do the customer experience better?” It is one of the best examples of Kafka at the edge in an autonomous way and also replicating with the cloud for aggregation and analytics.

Speaking of integrations, I just read your article on Kafka and blockchain technology. Could this be a starting point to use blockchain with edge platforms to build super-secure and scalable infrastructure?

KW: Actually, I don’t think so. First, let me clarify. When I talk about blockchain, I do not mean just cryptocurrency such as Bitcoin. It’s really more about building applications on top of a blockchain. The only situation outside of cryptocurrency where I see the success of blockchain these days is a Supply Chain because it is a complex process with many untrusted partners. You want to track those things and don’t trust the other guys in other countries. This is maybe where blockchain would work. But for most other cases, it’s always about comparing the added value versus the complexity and risk and cost.

I talked to a lot of customers about a blockchain and or Kafka. And most of them are evaluating blockchain, but they couldn’t make an argument why they really need it.

OK, maybe blockchain is just not the right way to promote it, because it is such a buzzword. What I rather meant is immutability and fault tolerance, which is what you get with Kafka and which is also very crucial for the edge.

KW: Yes, fault tolerance is built into Kafka. While Kafka is append-only, in theory, someone can change the data in the log on disk. If you want or have to make the data tamper-proof on disk, an additional tool can be used. This is still much less complex than a blockchain framework such as Hyperledger. And so probably for most actual cases, it’s good enough.

More details about the discussion blockchain vs. Kafka can be found here: https://www.kai-waehner.de/blog/2020/07/17/apache-kafka-blockchain-dlt-comparison-kafka-native-vs-hyperledger-ethereum-ripple-iota-libra/

Kai Waehner continuously writes about modern Advanced Analytics, Cloud / Hybrid Architectures, Stream Processing, and IoT use cases on his blog, which we very much recommend to check out.

If you made it to the bottom of this interview with a question or a comment on Kafka application for the edge, feel welcome to share it in the comments or post it on Linkedin tagging @Kai Waehner and @Hivecell. We are eager to discuss the approaches and challenges from your perspective.

--

--