Software Engineer @WalmartLabs | Previously @Olacabs | Committer @ApachePinot | Ping me on kharekartik@gmail.com for queries or any writing opportunities.

870 Followers
·
Follow

Image for post
Image for post
Photo by Shahadat Rahman on Unsplash

Apache Pinot is a realtime distributed OLAP datastore that can answer hundreds of thousands of queries with millisecond latencies. You can head over to https://pinot.apache.org/ to get started with Apache Pinot.

While using any database, we can come across a scenario where a function required for the query is not supported out of the box. In such time, we have to resort to raising a pull request for a new function or finding a tedious workaround.

Pinot aims to solve this particular pain-point by giving users the power to add their functions with almost zero lines of code. …


Image for post
Image for post
Photo by Feelfarbig Magazine on Unsplash

One of the primary advantages of using Pinot is its pluggable architecture. The plugins make it easy to add support for any third-party system which can be an execution framework, a filesystem, or input format.

In this tutorial, we will use three such plugins to easily ingest data and push it to our Pinot cluster. The plugins we will be using are -

  • pinot-batch-ingestion-spark
  • pinot-s3
  • pinot-parquet

You can check out Batch Ingestion, File systems, and Input formats for all the available plugins.

Setup

We are using the following tools and frameworks for this tutorial -


Image for post
Image for post
Photo by Franki Chamaki on Unsplash

The overflow of data in the world opened up a multitude of opportunities to learn and analyze the behavior of people all around the world. Most analyses require at least a few days of data if not more which results in a need of a fast queryable storage engine. OLAP databases exist to serve this purpose only i.e. to make huge amounts of data easily queryable with minimal latencies.

To minimize the latency, all databases have indices created on data. The index is generally a tree-based structure such as B-tree, R-Tree, etc. …

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store