Predict with sklearn 20x faster

Published in

Building Ibotta

7 min readFeb 25, 2020

At Ibotta we deploy a lot of machine learning (ML) models. We use a number of technologies including Spark for batch and Amazon SageMaker for realtime. While we’ve written in the past about some of our use cases and open source packages for training models, this post is all about prediction.

As we’ve taken ML out of the data lake (batch prediction) and into the microservices ecosystem (on-demand prediction) we’ve primarily utilized Amazon SageMaker to deploy realtime machine learning prediction services. This has served us well, as we heavily rely on scikit-learn, TensorFlow, PyTorch, XGBoost and fasttext which are all supported by SageMaker.

In addition to migrating ML deployment to microservices, we’ve also been looking at serverless technology to deploy microservices more generally. In considering what it would be like to deploy ML model prediction as serverless applications, a few things become clear:

ML packages and their dependencies are too big
ML models artifacts are too big
ML prediction isn’t too slow— but faster is better
We often only want to predict one record per request

These considerations motivated the exploration of slimming down machine learning prediction to a dedicated library. By converting prediction methods to pure Python, we can make predictions faster, dependency packages smaller, and model objects lighter weight. This doesn’t solve everything for serverless inference as some models will just be too big — but it’s a start.

Introducing pure-predict

We are excited to announce the launch of our open source project pure-predict. The goal of the project is to speed up and slim down machine learning prediction applications. It is a foundational tool for serverless inference or small batch prediction with popular machine learning frameworks like scikit-learn and fasttext. It implements the predict methods of these frameworks in pure Python.

The primary benefit of pure-predict is illustrated by the relative size of the package compared to scikit-learn and its dependencies. Here is scikit-learn:

All in all, 255.9MB. In total, pure-predict is 3.6MB — keeping in mind that it has zero dependencies outside of Python built-ins.

As far as container services go, here are some notes on docker image sizes. From the post:

Don’t create large images – A large image will be harder to distribute. Make sure that you have only the required files and libraries to run your application/process.

Consider how many files and libraries are contained in scikit-learn and all of its dependencies that have nothing to do with the predict() method of any given scikit-learn estimator.

As far as serverless goes, here are some notes on AWS Lambda deployment package limitations. From the post:

250 MB: Size of code/dependencies that you can zip into a deployment package (uncompressed .zip/.jar size)

There are indeed strategies to reduce the size of deployment packages but scikit-learn and its dependencies are pushing the limit here. Regardless of workarounds, large deployment packages can cause serverless functions to be 5–10 times slower at cold start. This could be problematic depending on the access patterns and SLAs of the service.

Therein lies the motivation for pure-predict. As shown below, package size is only one of the benefits. We can achieve large performance improvements across a variety of metrics with pure Python implementations of ML prediction methods.

Example

In an environment with scikit-learn and it’s dependencies installed we can train a model, convert it to a pure-predict object, and save the object to disk:

import pickle

from sklearn.ensemble import RandomForestClassifier
from sklearn.datasets import load_iris
from pure_sklearn.map import convert_estimator

# fit sklearn estimator
X, y = load_iris(return_X_y=True)
clf = RandomForestClassifier()
clf.fit(X, y)

# convert to pure python estimator
clf_pure_predict = convert_estimator(clf)
with open("model.pkl", "wb") as f:
    pickle.dump(clf_pure_predict, f)

In an environment with only Python and pure-predict installed, we can load the model into memory and make predictions:

import pickle

# load pickled model
with open("model.pkl", "rb") as f:
    clf = pickle.load(f)

# make prediction with pure-predict object
y_pred = clf.predict([[0.25, 2.0, 8.3, 1.0]])
print(y_pred)
[2]

This is a trivial example — but consider this example in the github repository. This is a model trained on 2 categories (rec.autos, sci.space) of the 20 newsgroups dataset. A text extraction pipeline is constructed that unions a TfidfVectorizer and HashingVectorizer (not recommended to union both of these in practice — though useful to illustrate performance). This feature engineering pipeline is chained to a full depth RandomForestClassifier with 100 estimators. Here are the comparison statistics between the scikit-learn version of the pipeline and the pure_sklearn version of the pipeline:

Pickle Size sklearn: 5.55 MB
Pickle Size pure-predict: 3.33 MB
Difference: 0.60Unpickle time sklearn: 0.0482
Unpickle time pure-predict: 0.0376
Difference: 0.78Predict 1 record sklearn: 0.0942
Predict 1 record pure-predict: 0.0043
Difference: 0.05

The pure_sklearn model is 60% of the size of its scikit-learn equivalent. This is valuable for serverless cold start or container size. It un-pickles (loads into memory from disk and deserializes) in 78% of the time of its scikit-learn equivalent. Again, valuable for serverless cold start. Finally, it predicts in 5% of the time of its scikit-learn equivalent for single record prediction. This is particularly important as 94ms request latency and 4ms request latency can be the difference between a service that is viable in certain microservices ecosystems and one that isn’t.

An important caveat here is that this performance comparison won’t hold for all models in all situations. Trees are particularly performant with pure_sklearn. Linear models with wide feature spaces and a large n-classes will be less so (they benefit from the economies of scale with array operations in numpy) and will underperform scikit-learn as n-classes and n-features grows. In all cases, as the size of n-samples for prediction grows, all pure-predict models will underperform.

Features

At launch time, pure-predict contains two subpackages: pure_sklearn and pure_fasttext. The pure_sklearn package parallels scikit-learn for a number of estimators and transformers. A fitted scikit-learn estimator, pipeline or transformer can be converted into a pure_sklearn object. This object can then make predictions without scikit-learn and it’s dependencies. Note that pure_sklearn also supports XGBoost models fit with the scikit-learn API.

The pure_fasttext package functions similarly. For unsupervised mode, it handles vector lookups for words and sentences. For supervised mode, it handles predictions.

How It Works

The implementation approach for pure-predict is very much brute force. Predict methods for various estimators are literally re-written in pure Python. Here is an example for a decision tree classifier from pure-predict:

This function returns the leaf node for any input x, where x is one record to predict (either a sparse_list or a list of floats). This is the guts of the prediction method for a decision tree. This is ported to pure Python directly from the scikit-learn implementation which uses Cython and numpy to achieve the same result:

By porting this to pure Python, we get the same functionality without the numpy dependencies and Cython implementation. While these certainly speed up training and prediction with many samples, they are generally slower for single record prediction. Additionally they require >200MB of numpy and scipy in order to run.

Other methods were considered for slimming down package sizes. One idea would be to take any given fitted scikit-learn estimator and strip away unused modules in scikit-learn, numpy or scipy. This is doable with scikit-learn, as you can typically identify modules of the scikit-learn package that aren’t being used by any given estimator (for example, a random forest doesn’t need code in the linear_models module). However, it gets more challenging once you drill down into numpy and scipy and their various cross-dependencies with each other and with scikit-learn.

Ultimately, the brute force approach of porting prediction methods to pure Python was the chosen direction.

Use Cases

The primary use case for pure-predict is spelled out in more detail in the README. It largely involves training a model in an offline environment, converting the fitted model to a pure-predict object and pickling, and finally deploying the pure-predict model in a serverless function:

Other use cases are for speeding up specific applications (serverless or otherwise) that roughly meet these criteria:

Single record or small batch prediction
Trees (decision tree, random forest, gradient boosted trees, etc.), linear or bayesian models with small n-classes/n-features, and text extraction pipelines

Getting Started

To get started with pure-predict, check out the installation guide. The code repository also contains a library of examples to illustrate some of the use cases of pure-predict. All are welcome to submit issues and contribute to the project. Note the specific calls for contributors to get involved.

If these kinds of projects and challenges sound interesting to you, Ibotta is hiring! Check out our jobs page for more information.