Apache Pinot is a realtime distributed OLAP datastore that can answer hundreds of thousands of queries with millisecond latencies. You can head over to to get started with Apache Pinot.

While using any database, we can come across a scenario where a function required for the query is not supported out of the box. In such time, we have to resort to raising a pull request for a new function or finding a tedious workaround.

Pinot aims to solve this particular pain-point by giving users the power to add their functions with almost zero lines of code. …

One of the primary advantages of using Pinot is its pluggable architecture. The plugins make it easy to add support for any third-party system which can be an execution framework, a filesystem, or input format.

In this tutorial, we will use three such plugins to easily ingest data and push it to our Pinot cluster. The plugins we will be using are -

  • pinot-batch-ingestion-spark
  • pinot-s3
  • pinot-parquet

You can check out Batch Ingestion, File systems, and Input formats for all the available plugins.


We are using the following tools and frameworks for this tutorial -

The overflow of data in the world opened up a multitude of opportunities to learn and analyze the behavior of people all around the world. Most analyses require at least a few days of data if not more which results in a need of a fast queryable storage engine. OLAP databases exist to serve this purpose only i.e. to make huge amounts of data easily queryable with minimal latencies.

To minimize the latency, all databases have indices created on data. The index is generally a tree-based structure such as B-tree, R-Tree, etc. …

