Speed Up Your SVM Some More with This Simple Trick

Tony Yang
arabesqueai
Published in
4 min readSep 16, 2022

Tony Yang (Associate AI Engineer), Matthias Baetens @ Arabesque AI

Repository: https://github.com/arabesqueai/lisbon
PyPI: https://pypi.org/project/lisbon/

TLDR: lisbon is a Rust port of liblinear's L2-regularised hinge loss binary classification routine. We have achieved a 20-40% speedup in computational time and a greater than 50% memory saving by eliminating memory copies. The package is pip-installable and can be used as a drop-in replacement in scikit-learn.

Introduction

At Arabesque AI, we use a range of classical machine learning and deep learning approaches to analyse stock market trends, which are then used in our portfolio optimisation process. One of the models we use is the Support Vector Machine (SVM). In particular, we use linear support vector classifiers (LinearSVC) to analyse the market in about a month’s time. A linear kernel was chosen as we can have sizeable training data, with around 750,000 data points and 1000 dimensions for each data point. A linear kernel SVM implemented by liblinear using coordinate descent is orders of magnitude faster than libSVM with nonlinear mappings. Training nonlinear SVM would be prohibitively long given our dataset size and considering that we train individual models for each investment universe and sector combination, and we have quite a few universe-sector combinations, close to 1000 and still expanding! To overcome the linear model limitation, we use the Nystroem method to preprocess our data and introduce nonlinearity. Still, with all those efforts, SVM is still the slowest component of our AI pipeline.

Project Lisbon

Towards the end of my internship at Arabesque AI, I was looking into speeding up our ML pipelines and SVM was the major target. Intel’s Scikit-learn extension gave us hope in bringing acceleration to our SVM module. But to our disappointment, it does not speed up the LinearSVC class of Scikit-learn which we rely on. Interested in understanding more about the subroutine in LinearSVC , we decided to dig deeper into the source code of liblinear which LinearSVC is a wrapper around. The underlying library is of good code quality and optimised but we did spot a few points for improvement, especially when considering our use case:

  1. A big assumption made by liblinear is that the training data is sparse while our real-world dataset has only ~0.008% values being empty.
  2. The BLAS math routine in liblinear operates on sparsely formatted matrices. While this can technically improve performance when the input data is sparse, it is not the case for our data and can prevent compiler optimisations and vectorisation which degrades performance.
  3. The implication of requiring a sparse matrix input also means that Scikit-learn actually has to convert the dense numpy array input data into a sparsely formatted array and then pass it to the liblinear routine. This causes memory bloat as the data is duplicated (~1.5x overhead with 32bit integer index) and introduces a significant performance overhead.

With the above points, we decided to reimplement the L2-regularised hinge loss subroutine in Rust and expose it as a module to Python, hence lisbon was born. To our delight, we achieved both a 20-40% performance improvement and memory footprint reduction while faithfully reproducing Scikit-learn's results.

Lisbon is now published on PyPI (repository) and is ready for testing and production use. Simply importing the module would monkey patch the Scikit-learn class to use lisbon instead of liblinear where applicable (currently supports binary classification with L2-regularised hinge loss).

Note that the PyPI packaged version is compiled with AVX2 instruction set. Install from source if your platform does not support AVX2.

Talk is cheap, show me the benchmark

As with all benchmarks, your mileage may vary :)

With PMLB’s adult dataset

48842 rows of data and 14 features.

With Arabesque’s dataset

210447 rows of data and 1000 features.

With Arabesque’s bigger dataset

747922 rows of data and 1000 features.

Note that the RUSTFLAGS='-C target-cpu=native' environmental variable ensures that rustc compiles against all your CPU's supported instruction sets to enable more SIMD optimisations (e.g. AVX2, FMA).

These results show that our rewrite achieved performance improvement across different dataset sizes while significantly reducing the memory footprint in the large dataset regime. Noteworthily, the additional compilation flag target-cpu=native brought substantial performance improvement, this demonstrates the power of modern compiler infrastructure's auto-vectorisation capability ( rustc uses LLVM as its backend).

RiiR, how was the experience?

Rewrite it in Rust, despite becoming a meme, it does show that people are excited and hyped about the benefits brought along with the Rust language. With lisbon, I started with a verbatim translation from the original C++ code to Rust which was rather effortless given the similarity of both languages and the simplicity in structure of mathematical computation code. Once I had a runnable version that can reproduce liblinear's results faithfully, I started optimising out some unnecessary code, targeting specifically for binary classification with L2-regularised hinge loss. Then I moved on to expose the Rust binding for Python with PyO3 which comes with very handy guides and ergonomic APIs. Finally, the code is modified to take the underlying numpy array directly without converting it to a sparse matrix. In total, this project took a week to complete with 3 days on verbatim translation, 1 day on optimisation, and 1 day on Python binding.

Packaging the mixed Python/Rust project and publishing it to PyPI was a breeze with the help of maturin (from PyO3 developers). We see the Rust ecosystem gradually maturing and now feels better than ever to write high-performance Python libraries in Rust! Overall, it was a pleasant experience converting the C++ code to Rust. Languages like Rust make it a conscious choice as to when to make copies of memory and when to take a reference which greatly helped with memory optimisation. That being said, there are still quite a few C++ patterns leftover in the codebase which is considered unidiomatic in Rust. Please help me fix those with PRs or issues :)

Open Source

Please see our public repository for the source code: https://github.com/arabesqueai/lisbon
Dual licensed under Apache2.0 or MIT.

--

--