Moving developers up the stack with Apache Pinot

Kenny Bastani
Apache Pinot Developer Blog
3 min readJun 23, 2020

Once upon a time, an internet company named LinkedIn faced the challenge of having petabytes of connected data with no way to analyze it in real-time. As this was a problem that was the first of its kind, there was only one solution. The company put together a talented team of engineers and tasked them with building the right tool for the job. Today, that tool goes by the name of Apache Pinot.

Pinot enters into a storied legacy of innovations that have emerged from one of the world’s largest online social networks. Over a few decades, the Silicon Valley tech giant has helped hundreds of millions of people around the world navigate their careers. Now, as a Microsoft company, LinkedIn has endured and continues to keep colleagues connected through the inevitable successes and failures that come with having a great career.

And it might sound strange at first, to think that a feature called “Who Viewed My Profile” could lead to some of the most popular open source tools that are changing the way companies build and operate software.

Pinot is the latest Apache incubated project to follow in the footsteps of giants that include the likes of Kafka, Helix, and Samza — the former of which is quickly becoming a pillar of cloud-native applications. If software is indeed eating the world then it is likely that Apache Kafka might be responsible for eating the private data center.

Before Kafka jumped onto the scene as the industry standard message broker, big companies with hard software problems had little other choice than to operate their own hardware. But today, with the help of open source tools like Kafka, developers can bridge the gap between virtual machines in the data center and cloud-native applications.

Kafka’s secret? Turn streams of events collected from many different disconnected systems into topics that can be queried like a database without ever turning into one.

While Kafka represents a major step forward as a kind of “portable data warehouse”, a majority of application developers still struggle to transform event streams into complex query models without learning the ins-and-outs of Kafka streams.

Now, Pinot aims to take things one step further by moving up the stack from Kafka and giving developers the familiarity of a database that turns event streams into queryable data models. The end result is that developers have one less thing to worry about when it comes to building and operating their applications — reaping all of the benefits of Kafka — but focusing only on writing code that is most valuable to users.

Since Kafka is both a message broker and a transaction log, it often becomes used as a database, which puts the burden on developers to maintain projections of data sourced from event streams sitting in topics.

What Pinot offers is a tool that eliminates the need for developers to worry about using Kafka topics to build and maintain queryable projections from event streams. Because, as it turns out, worrying about event streams is just one more thing that slows down the development process at the cost of keeping data portable.

--

--

Kenny Bastani
Apache Pinot Developer Blog

Passionate technology evangelist and open source software advocate. International speaker & author of O’Reilly’s Cloud Native Java.