Speed Up Your SVM Some More with This Simple Trick
Tony Yang (Associate AI Engineer), Matthias Baetens @ Arabesque AI
Repository: https://github.com/arabesqueai/lisbon
PyPI: https://pypi.org/project/lisbon/
TLDR: lisbon
is a Rust port of liblinear
's L2-regularised hinge loss binary classification routine. We have achieved a 20-40% speedup in computational time and a greater than 50% memory saving by eliminating memory copies. The package is pip-installable and can be used as a drop-in replacement in scikit-learn
.
Introduction
At Arabesque AI, we use a range of classical machine learning and deep learning approaches to analyse stock market trends, which are then used in our portfolio optimisation process. One of the models we use is the Support Vector Machine (SVM). In particular, we use linear support vector classifiers (LinearSVC) to analyse the market in about a month’s time. A linear kernel was chosen as we can have sizeable training data, with around 750,000 data points and 1000 dimensions for each data point. A linear kernel SVM implemented by liblinear
using coordinate descent is orders of magnitude faster than libSVM
with nonlinear mappings. Training nonlinear SVM would be prohibitively long given our dataset size and considering that we train individual models for each investment universe and sector combination, and we have quite a few universe-sector combinations, close to 1000 and still expanding! To overcome the linear model limitation, we use the Nystroem method to preprocess our data and introduce nonlinearity. Still, with all those efforts, SVM is still the slowest component of our AI pipeline.
Project Lisbon
Towards the end of my internship at Arabesque AI, I was looking into speeding up our ML pipelines and SVM was the major target. Intel’s Scikit-learn extension gave us hope in bringing acceleration to our SVM module. But to our disappointment, it does not speed up the LinearSVC
class of Scikit-learn
which we rely on. Interested in understanding more about the subroutine in LinearSVC
, we decided to dig deeper into the source code of liblinear
which LinearSVC
is a wrapper around. The underlying library is of good code quality and optimised but we did spot a few points for improvement, especially when considering our use case:
- A big assumption made by
liblinear
is that the training data is sparse while our real-world dataset has only ~0.008% values being empty. - The BLAS math routine in
liblinear
operates on sparsely formatted matrices. While this can technically improve performance when the input data is sparse, it is not the case for our data and can prevent compiler optimisations and vectorisation which degrades performance. - The implication of requiring a sparse matrix input also means that
Scikit-learn
actually has to convert the densenumpy
array input data into a sparsely formatted array and then pass it to theliblinear
routine. This causes memory bloat as the data is duplicated (~1.5x overhead with 32bit integer index) and introduces a significant performance overhead.
With the above points, we decided to reimplement the L2-regularised hinge loss subroutine in Rust and expose it as a module to Python, hence lisbon
was born. To our delight, we achieved both a 20-40% performance improvement and memory footprint reduction while faithfully reproducing Scikit-learn
's results.
Lisbon is now published on PyPI (repository) and is ready for testing and production use. Simply importing the module would monkey patch the Scikit-learn
class to use lisbon
instead of liblinear
where applicable (currently supports binary classification with L2-regularised hinge loss).
Note that the PyPI packaged version is compiled with AVX2 instruction set. Install from source if your platform does not support AVX2.
Talk is cheap, show me the benchmark
As with all benchmarks, your mileage may vary :)
With PMLB’s adult dataset
48842 rows of data and 14 features.
With Arabesque’s dataset
210447 rows of data and 1000 features.
With Arabesque’s bigger dataset
747922 rows of data and 1000 features.
Note that the RUSTFLAGS='-C target-cpu=native'
environmental variable ensures that rustc
compiles against all your CPU's supported instruction sets to enable more SIMD optimisations (e.g. AVX2, FMA).
These results show that our rewrite achieved performance improvement across different dataset sizes while significantly reducing the memory footprint in the large dataset regime. Noteworthily, the additional compilation flag target-cpu=native
brought substantial performance improvement, this demonstrates the power of modern compiler infrastructure's auto-vectorisation capability ( rustc
uses LLVM as its backend).
RiiR, how was the experience?
Rewrite it in Rust
, despite becoming a meme
, it does show that people are excited and hyped about the benefits brought along with the Rust language. With lisbon
, I started with a verbatim translation from the original C++ code to Rust which was rather effortless given the similarity of both languages and the simplicity in structure of mathematical computation code. Once I had a runnable version that can reproduce liblinear
's results faithfully, I started optimising out some unnecessary code, targeting specifically for binary classification with L2-regularised hinge loss. Then I moved on to expose the Rust binding for Python with PyO3
which comes with very handy guides and ergonomic APIs. Finally, the code is modified to take the underlying numpy
array directly without converting it to a sparse matrix. In total, this project took a week to complete with 3 days on verbatim translation, 1 day on optimisation, and 1 day on Python binding.
Packaging the mixed Python/Rust project and publishing it to PyPI was a breeze with the help of maturin (from PyO3
developers). We see the Rust ecosystem gradually maturing and now feels better than ever to write high-performance Python libraries in Rust! Overall, it was a pleasant experience converting the C++ code to Rust. Languages like Rust make it a conscious choice as to when to make copies of memory and when to take a reference which greatly helped with memory optimisation. That being said, there are still quite a few C++ patterns leftover in the codebase which is considered unidiomatic in Rust. Please help me fix those with PRs or issues :)
Open Source
Please see our public repository for the source code: https://github.com/arabesqueai/lisbon
Dual licensed under Apache2.0 or MIT.