Utilize UDFs to Supercharge Queries in Apache Pinot

Kartik Khare
Apache Pinot Developer Blog
4 min readSep 29, 2020
Photo by Shahadat Rahman on Unsplash

Apache Pinot is a realtime distributed OLAP datastore that can answer hundreds of thousands of queries with millisecond latencies. You can head over to https://pinot.apache.org/ to get started with Apache Pinot.

While using any database, we can come across a scenario where a function required for the query is not supported out of the box. In such time, we have to resort to raising a pull request for a new function or finding a tedious workaround.

Pinot aims to solve this particular pain-point by giving users the power to add their functions with almost zero lines of code. The 0.5.0 release comes bundled with two such features —

  • Support for Scalar Functions that allow users to write and add their functions as a plugin.
  • Support for inline Apache Groovy scripts in SQL queries.

In this article, we’ll be focusing on Scalar functions.

What are Scalar Functions?

Scalar functions in Pinot are stateless functions which transform input A into output A. e.g. upper , lower , length etc. are scalar functions. However, sum , count are non-scalar functions.

You can take a look at the scalar functions supported out of the box in official repository.

Implement a scalar function

Let’s create a maven java project. We’ll call this project scalar-function-example.

Scalar functions require @ScalarFunction annotation. This annotation is present in pinot-spi package. You can add the package dependency in maven as follows —

<dependency>
<groupId>org.apache.pinot</groupId>
<artifactId>pinot-spi</artifactId>
<version>0.11.0</version>
<scope>provided</scope>
</dependency>

Let’s create a class CustomScalarFunctions that will contain all of our functions. Let’s write a java method that converts latitude and longitude to geohash encoding. Generally, Geohash encoders are not available out of the box in most of the database.

Now, you need to annotate this function with @ScalarFunction. The annotation supports the following arguments —

  • name — The name of the function to be used while querying. e.g., the Java method name can be calculateLength , but the query name can be calculate_length_str. The default is the same as the method name.
  • enable — Boolean value indicating whether this function should be registered or not for the usage. This can be used in particular scenarios where you may require to disable a function because of some error or duplicate methods.

Finally, the class should be present in the package name, which follows the following pattern — org.apache.pinot.*.function.*

This is because we find the custom methods using reflection, and currently, only the packages with the mentioned pattern are considered for the search. The final code should look as follows

Register the Scalar Function

Once you have written the code, just compile the code into a JAR file.

Next, if you haven’t downloaded Pinot, you can do that from our website.

Note that scalar functions are supported only in releases 0.5.0 and above.

Now, copy the JAR in theplugins or lib directory of the Pinot binary distribution and restart the Pinot cluster.

Use the Scalar Function

Now you can use the registered function as follows

All of the arguments can either be literal constants or column names. An exception will be thrown if the column’s data type doesn’t match the input data type.

custom function demo

RoadMap

We saw how easy it is to add your methods to Pinot for various use cases. The current implementation has few limitations —

  • Only the following data types are supported in Input and output — Integer, Long, Float, Double, String, and Bytes. Objects such as Lists, Map are not supported currently.
  • You can return only a single value.
  • No support for multi-valued columns.
  • Only Java methods are supported as of the moment.

We are working on removing most of these limitations in future releases.

Groovy Functions

Pinot also supports Apache Groovy script in SQL which you can use to write custom functions. It requires no additional code from the user. You can directly use the scripts in the query as follows —

Read our official documentation for more info on both Groovy and Scalar functions.

You can find the complete demo code in the pinot-examples repository.

If you liked this article, head over to https://pinot.apache.org and try it out today.

You can head over to our slack workspace for any queries or discussions.

--

--

Kartik Khare
Apache Pinot Developer Blog

Software Engineer @StarTree | Previously @WalmartLabs, @Olacabs | Committer @ApachePinot