Real-time Analytics with Presto and Apache Pinot — Part II

Xiang Fu
Apache Pinot Developer Blog
2 min readFeb 2, 2021

This blog post is the second part of a two-part series on using Presto with Apache Pinot. You can find the first part here on how analytics systems make trade-offs for latency and flexibility.

Achieve the best of both: Presto Pinot Connector

Continuing from the first part of this series, we’re now going to focus on a Presto Pinot Connector. The diagram below shows the latency versus flexibility tradeoff between Presto and Pinot.

Latency vs. Flexibility vs. Throughput

In Presto, users are getting excellent flexibility: Full-SQL support, ability to do multi-way JOINs. However, it may take seconds to minutes for a query to return from the latency perspective, depending on the data volume.

On the other side of the world, Pinot users can store from raw data to pre-joined/pre-aggregated, even pre-cubed data along with advanced index technologies to speed up queries. Pinot query engine optimizes for those analytical query patterns, like aggregations/group-by. Pinot gives the flexibility of slicing and dicing while maintaining a low query latency. However, Pinot isn’t flexible enough due to the lack of full SQL support.

The needs to accelerate Presto query speed and support more functionality for Pinot users are a perfect match. This is the primary motivation for the birth of the Presto Pinot connector. This complete system covers the ENTIRE landscape of analytics, and we can leverage the best part of Presto and Pinot. This new solution enables Uber’s operations teams with basic SQL knowledge to build dashboards for quick analysis and report aggregated data without spending extra time working with engineers on data modeling or building data pipelines, leading to efficiency gains and resource savings across the company.

Since then, the Presto and Pinot community has contributed many features to make the solution flexible and scalable.

E.g.,

  • Array type support and functions pushdown
  • Timestamp/date-type inferral and predicate pushdown
  • Support Pinot gRPC server for segment level queries

Chasing the light: Aggregation pushdown

--

--

Xiang Fu
Apache Pinot Developer Blog

Co-founder of StarTree, Apache Pinot Founding Member and PPMC Member