Using Polars Plugins for a 14x Speed Boost with Rust

Achieving high speed outside the native Polars library

Nelson Griffiths
Towards Data Science
8 min readNov 9, 2023

--

Generated by DALL-E 3

Introduction

Polars is taking the world by storm thanks to it’s speed, memory efficiency, and beautiful API. If you want to know how powerful it is, look no further than the DuckDB Benchmarks. And these aren’t even using the most recent version of Polars.

For all the amazing things Polars can do though, it has not traditionally been a better solution than Pandas to do ALL the calculations you might want to do. There are a few exceptions where Polars has not outperformed. With the recent release of the Polars plugin system for Rust though, that may no longer be the case.

Polars Plugins

What exactly is a polars plugin? It is simply a way to create your own Polars Expressions using native Rust and exposing those to expressions using a custom namespace. It allows you to take the speed of Rust, and apply it to your Polars DataFrame to perform calculations in a way that takes advantage of the speed and built-in tooling Polars provides.

Let’s take a look at some concrete examples.

Sequential Calculations

One area that Polars seems to lack some functionality is operations that require a knowledge of the previous value of a DataFrame. Calculations that are sequential in nature are not always super easy or efficient to write in native Polars expressions. Let’s take a look at one specific example.

We have the following algorithm to calculate the cumulative value of an array of numbers for a given run, defined as a set of numbers that have the same sign. For example:

┌───────┬───────────┐
│ value ┆ run_value │
│ --- ┆ --- │
│ i64 ┆ i64 │
╞═══════╪═══════════╡
│ 1 ┆ 1 │ # First run starts here
│ 2 ┆ 3 │
│ 3 ┆ 6 │
│ -1 ┆ -1 │ # Run resets here
│ -2 ┆ -3 │
│ 1 ┆ 1 │ # Run resets here
└───────┴───────────┘

So we want to have a cumulative sum of a column which resets every time the sign of the value switches from either positive to negative or negative to positive.

Lets start with a baseline version written in pandas.

def calculate_runs_pd(s: pd.Series) -> pd.Series:
out = []
is_positive = True
current_value = 0.0
for value in s:
if value > 0:
if is_positive:
current_value += value
else:
current_value = value
is_positive = True
else:
if is_positive:
current_value = value
is_positive = False
else:
current_value += value
out.append(current_value)
return pd.Series(out)

We iterate over a series, calculating the current value of the run at each position, and returning a new Pandas Series.

Benchmarking

Before moving on, we are going to set up a few benchmarks. We are going to measure both execution speed and memory consumption using pytest-benchmark and pytest-memray. We will set up the problem such that we have an entity column, a time column, and a feature column. The goal is to calculate the run values for each entity in the data across time. We will set the number of entities and time stamps each to 1,000, giving us a DataFrame with 1,000,000 rows.

When we run our Pandas implementation against our benchmark using Pandas’ groupby apply functionality we get the following results:

Pandas Apply Pytest-Benchmark (Image by Author)
Memray Output for Pandas Apply (Image by Author)

Polars Naive Implementation

Okay, so now we have our benchmark. Let’s look at implementing this same functionality in Polars now. We will start with a very similar looking version that will be applied by mapping the function across a Polars GroupBy object.

def calculate_runs_pl_apply(s: pl.Series) -> pl.DataFrame:
out = []
is_positive = True
current_value = 0.0
for value in s:
if value is None:
pass
elif value > 0:
if is_positive:
current_value += value
else:
current_value = value
is_positive = True
else:
if is_positive:
current_value = value
is_positive = False
else:
current_value += value
out.append(current_value)
return pl.DataFrame(pl.Series("run", out))

Now let’s see how this compares to our original Pandas benchmark.

Pandas Apply vs Polars Apply Pytest-Benchmark (Image by Author)
Memray Output for Polars Apply (Image by Author)

Well, that didn’t work very well. That shouldn’t come as a surprise though. The writers of Polars have made it very clear that the very common groupby apply approach in Pandas is not an efficient way to do computations in Polars. Here it shows. Both the speed and memory consumption are worse than our original Pandas implementation.

Polars Expression Implementation

Let’s write this same function as native Polars expressions now. This is the preferred and optimized way to work with Polars. The algorithm will look a little different. But here is what I came up with to calculate the same output.

def calculate_runs_pl_native(df: pl.LazyFrame, col: str, by: str) -> pl.LazyFrame:
return (
df.with_columns((pl.col(col) > 0).alias("__is_positive"))
.with_columns(
(pl.col("__is_positive") != pl.col("__is_positive").shift(1))
.over(by)
.fill_null(False)
.alias("__change_sides")
)
.with_columns(pl.col("__change_sides").cumsum().over(by).alias("__run_groups"))
.with_columns(pl.col(col).cumsum().over(by, "__run_groups").alias("runs"))
.select(~cs.starts_with("__"))
)

A quick explanation for what we are doing here:

  • Find all the rows where the feature is positive
  • Find all the rows where the __is_positive column is different from the previous row.
  • Take a cumulative sum of __change_sides to mark each distinct run
  • Take a cumulative sum of the value over each distinct run

So now we have our native Polars function. Let’s do our benchmark again.

Pandas Apply vs Polars Apply vs Polars Native Pytest-Benchmark (Image by Author)
Memray Output for Polars Native (Image by Author)

We unfortunately did not see an improvement in the execution speed of our function. This is likely due to the number of over statements we have to do in order to calculate the run values. We did however, see an expected memory reduction. There may be an even better way to implement this with Polars expressions, but I am not going to worry about it right now.

Polars Plugins

So now let’s take a look at the new Polars plugins. If you want a tutorial on setting these up, take a look at the documentation here. Here I am mostly going to show a specific implementation of a plugin. First we are going to write our algorithm in Rust.

use polars::prelude::*;
use pyo3_polars::derive::polars_expr;

#[polars_expr(output_type=Float64)]
fn calculate_runs(inputs: &[Series]) -> PolarsResult<Series> {
let values = inputs[0].f64()?;
let mut run_values: Vec<f64> = Vec::with_capacity(values.len());
let mut current_run_value = 0.0;
let mut run_is_positive = true;
for value in values {
match value {
None => {
run_values.push(current_run_value);
}
Some(value) => {
if value > 0.0 {
if run_is_positive {
current_run_value += value;
} else {
current_run_value = value;
run_is_positive = true;
}
} else if run_is_positive {
current_run_value = value;
run_is_positive = false;
} else {
current_run_value += value;
}
run_values.push(current_run_value);
}
}
}

Ok(Series::from_vec("runs", run_values))
}

You will notice this looks pretty similar to the algorithm we wrote in Python. We aren’t doing any fancy Rust magic here! We denote the output type using a macro that polars provides and that is it. We can then register our new function as an expression.

from polars import selectors as cs
from polars.utils.udfs import _get_shared_lib_location

lib = _get_shared_lib_location(__file__)


@pl.api.register_expr_namespace("runs")
class RunNamespace:
def __init__(self, expr: pl.Expr):
self._expr = expr

def calculate_runs(
self,
) -> pl.Expr:
return self._expr.register_plugin(
lib=lib,
symbol="calculate_runs",
is_elementwise=False,
cast_to_supertypes=True,
)

And then we can run it like this:

from polars_extentsion import RunNamespace

df.select(
pl.col(feat_col).runs.calculate_runs().over(entity_col).alias("run_value")
).collect()

Okay now lets check out the results!

All Implementations Pytest-Benchmark (Image by Author)
Memory Output for Polars Plugin (Image by Author)

Now that is more like it! We got a 14x speed improvement and dropped from ~57MiB to ~8MiB of memory allocated.

When to Use Polars Plugins

Now that I have shown the power of using plugins, let’s talk about when you shouldn’t use them. A few reasons I might not use plugins (each with it’s own caveats):

  • If you can easily write a really fast version of your calculation using native Polars expressions. The Polars developers are really smart. I would not bet money on myself writing a function significantly faster than they can. The tools for Polars are there. Take advantage of what they are good at!
  • If there is no natural parallelization for your calculation. For example, if we were not running the above problem over multiple entities, our speedup would likely have been significantly less. We benefitted both from the speed of Rust, and the natural ability of Polars to apply our Rust function over multiple groups at once.
  • If you don’t need top notch speed or memory performance. Many people will agree that writing Rust is much more difficult and time consuming than writing Python. So if you don’t care if your function takes 2 seconds to run instead of 200 ms, you may not need to use plugins.

Keeping the things above in mind, here are now a few requirements that I feel pull me towards using plugins sometimes:

  • Speed and memory matter a lot. I recently rewrote a lot of a data pipeline’s functionality in a Polars plugin because we were switching back and forth between Polars and other tools and the memory allocations were getting too big. It was getting hard to run the pipeline on the infrastructure we wanted to with the amount of data we wanted to. The plugins made it easy to run the same pipeline in much less time and on a much smaller machine.
  • You have a unique use case. Polars provides so many built in functions. But it is a generic toolset that is broadly applicable to a lot of problems. Sometimes that toolset is not specifically applicable to the problem you are trying to solve. In this case, a plugin might be exactly what you want. Two of the most common examples of this that I have run into are more intense mathematical calculations, such as applying a cross-sectional linear regression, or sequential (row-based) calculations as we showed here.

The new plugin system is the perfect compliment to all of the columnar-based calculations that Polars already supports out of the box. With this addition, Polars is allowing for a beautiful extensibility to its capabilities. On top of writing your own plugins, watch out for some cool Polars plugin packages being developed that you can use to extend your capabilities without having to write plugins yourself!

Polars is moving fast and making waves. Check out the project, start using it, watch out for what other awesome features they will be releasing, and maybe start learning a little Rust while you are at it!

--

--

Towards Data Science
Towards Data Science

Published in Towards Data Science

Your home for data science and AI. The world’s leading publication for data science, data analytics, data engineering, machine learning, and artificial intelligence professionals.

Nelson Griffiths
Nelson Griffiths

Written by Nelson Griffiths

Head of Engineering and Machine Learning at Double River Investments

No responses yet