Announcing Apache Pinot 0.6.0

Jialiang Jack Li
Apache Pinot Developer Blog
4 min readDec 4, 2020
Apache Pinot, a modern OLAP platform for event-driven data warehousing

We are excited to announce that Apache Pinot 0.6.0 has been released in November 2020. Apache Pinot is a real-time distributed datastore, designed to answer OLAP queries with low latency. This release introduced some excellent new features, including upsert feature, tiered storage, pinot-spark-connector, support of having clause, more validations on table config and schema, array transform functions, adding push job type of segment metadata only mode, and some new APIs like updating instance tags, new health check endpoint. In this post, we will be highlighting some of the significant features in this release.

Upsert Feature

Upsert(update and insert) is a frequently asked feature by a large number of use-cases, such as for data correction. For example, a UberEats op may want to display the current status of the Eats deliveries. Previously, the query of “SELECT order_uuid, current_status, secondsSinceEpoch FROM eats_delivery_state” returned multiple records per order indicated by order_uuid, because the current_status can change from CREATED, to ASSIGNED to PICKEDUP_COMPLETED etc, as shown in the following diagram:

With upsert, this query returns one record per order, with the latest status.

The upsert has been requested and discussed for over a year within the Pinot community and an initial version of the design was drafted in the past. However, there were some challenges in the previous design. In Pinot 0.6.0, the upsert feature design is fully revamped, simplified, and it’s ready to be used in production. Please check this instruction on how to leverage this upsert feature in Pinot.

Pinot Spark Connector

In this release, we also introduce a Pinot-connector for Apache Spark. With this connector, the power of Pinot’s efficient indexing can be leveraged along with the power of Spark’s computation. In this implementation, we support distributed & parallel scan, column & filter pushdown, schema discovery, etc. Details on this feature can be found here.

Improvements of the Cluster Management UI

Pinot 0.5.0 introduced a brand new UI of cluster management. In Pinot 0.6.0, the UI has been enhanced, such as an autocomplete feature in the query console’s SQL editor when typing the @ symbol, as well as support to Update and Delete in Zookeeper browser, support to add tenants, instances, tables, segments count on the home page and make them clickable. Here are some of the screenshots for the new changes in the UI.

new UI for cluster management
autocomplete in query console

Tiered Storage

A common pattern for use cases is to have data with large retention, with more recent data being queried a lot more frequently than older historic data. Storing the data for the entire retention period on expensive SSDs is not cost effective. We proposed to improve storage cost efficiency with the Tiered Storage design, where more frequently data can be stored on faster storage (SSDs), whereas less frequently queried data can be stored on more cost effective storages such as HDDs, and even deep-store. In the Pinot 0.6.0 release, we implemented the phase 1 of this feature.

Having and Post-Aggregation Support

In this release, we also added support for new features such as the HAVING clause and post-aggregation expression evaluation. The support for HAVING clause enables users to perform group-filter within Pinot, instead of having to get all groups from Pinot and perform filtering on the client side. The post-aggregation expression evaluation now enables computing expressions such as sum(A) / sum(B) within Pinot, that would otherwise have to be computed on the client side.

Array Transformation Functions

Array functions are powerful transform operations in Presto, which helps to aggregate the elements in array columns.Inspired by presto array functions, these transform functions are also being brought into the Pinot world. In the Pinot 0.6.0 release, some of the array transform functions are implemented. Now array transform functions can even be passed from Presto to Pinot through presto-pinot-connector. Please check out this Github issue to track the status of the implementations.

Special thanks

We would like to take a moment to thank the Pinot community for supporting our Product. We keep a steady amount of commits for the past whole year and we’ve seen that more and more excellent features are implemented to this project. Hereby we would like to thank everyone who made the contributions to this release.

Number of commits to Pinot Github since 12/19 (Source)

--

--