Marrying Presto with Pinot and visualizing data on Superset.

4 min readJan 25, 2020

Recently Haibo Wang of Uber shared a wonderful article on how they are using Presto with Pinot

From article:

We engineered a solution that allows Presto’s engine to query Pinot’s data stores in real time, optimized for low query latency. Our new system utilizes the versatile Presto query syntax to allow joins, geo-spatial queries, and nested queries, among other requests. In addition, it enables queries of data in Pinot with a freshness of seconds. With this solution, we further optimized query performance by enabling aggregate pushdown, predicates pushdown, and limit pushdown, which reduces unnecessary data transfer and improves query latency by more than 10x.
This solution enabled greater analytical capabilities for operations teams across Uber. Now, users can fully utilize the flexibility of SQL to represent more complex business metrics, and render query results into a dashboard using in-house tools. This capability has improved our operations efficiency and reduced operations cost.

So I thought to do “hands-on” to understand the basic functionality of Pinot.

After this, you will be able to :

Setup a cluster on GKE
Deploy Pinot which will consume data from realtime stream like kafka
Deploy Presto
Deploy Superset which will connect pinot servers via Presto
Visualize data in Superset

Prerequisite :

Clone this repository.
Create a gCloud cluster pinot-quickstart
Request 2 servers of type n1-standard-8 for demo.
In incubator-pinot/kubernetes/helm/ (Run ./setup_gke.sh)

Use helm to deploy pinot:

Helm is a tool for managing Charts. Charts are packages of pre-configured Kubernetes resources.

To be able to use Helm, the server-side component tiller needs to be installed on your cluster.

helm init --service-account tiller

Deploy pinot cluster by:

helm install --namespace "pinot-quickstart" --name "pinot" .

Check deployment status:

kubectl get all -n pinot-quickstart

Bring up a Kafka Cluster for realtime data ingestion:

helm repo add incubator http://storage.googleapis.com/kubernetes-charts-incubator
helm install --namespace "pinot-quickstart"  --name kafka incubator/kafka

After this step you should get required service,statefulset.apps,pod.

You can check this by :

kubectl get all -n pinot-quickstart

Create Kafka topic:

kubectl -n pinot-quickstart exec kafka-0 -- kafka-topics --zookeeper kafka-zookeeper:2181 --topic flights-realtime --create --partitions 1 --replication-factor

Load data into Kafka and create Pinot schema/table

kubectl apply -f pinot-realtime-quickstart.yml

Query pinot data:

port-forwarding and open Pinot query console on your web browser.

./query-pinot-data.sh

Create a storage class:

Make changes to storageClassName in presto-coordinator.yaml and superset.yaml.

A StorageClass provides a way for administrators to describe the “classes” of storage they offer.
Each StorageClass contains the fields provisioner, parameters, and reclaimPolicy, which are used when a PersistentVolume belonging to the class needs to be dynamically provisioned.

Deploy Presto with Pinot plugin:

kubectl apply -f presto-coordinator.yaml

Use pinot connector where pinot act as a datastore for presto

Port forward Presto for queries optimizations.

./presto.sh

Deploy superset:

kubectl apply -f superset.yaml

Set up Admin account (First time)

kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'export FLASK_APP=superset:app && flask fab create-admin'

Init Superset (First time)

kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset db upgrade'
kubectl exec -it pod/superset-0 -n pinot-quickstart -- bash -c 'superset init'

Access Superset UI

./open-superset-ui.sh

Add presto source in Superset

In sources Add Database:

Add airlinestats table from this database

Create visualization of the table

Hopefully this overview has helped you getting started with Pinot-Presto

Special thanks to Xiang Fu and Kishore Gopalakrishna for their help.

If you found this helpful please share it on your favorite social media so other people can find it, too. 👏

I write about Distributed Systems, Python, Docker, data science,life lessons and more. If any of that’s of interest to you, read more here and follow me at linkedin and Youtube .

Marrying Presto with Pinot and visualizing data on Superset.

Written by Sachin Tripathi