Benchmarking BLAS libraries

Published in

DataThings

3 min readJul 9, 2018

BLAS stands for Basic Linear Algebra Subroutines, together with its extension LAPACK — Linear Algebra PACKage, they form the math library that underlies most of Machine Learning (ML) algorithms. At the lowest level of any ML program, everything can be translated into a series of numbers, arrays, matrices, tensors and mathematical operations over these structures. These mathematical operations constitute the basic atomic elements of ML.

Since we are talking about the foundation layer of ML, any acceleration that happens at this level will boost almost all algorithms using it. This is why at DataThings we invested lot of time and effort understanding, experimenting and testing different implementations of this library in order to beat the machine learning solutions in market in term of performance and efficiency.

In this blog post, we will show the results we got from testing 6 mathematical operations over 4 different BLAS implementations.

The BLAS implementations we benchmarked are the following:

Openblas (one of the most optimal and famous BLAS library)
Eigen 3.3.4 (a C++ template library for linear algebra)
CuBLAS+CuSolver (GPU implementations of BLAS and LAPACK by Nvidia that leverage GPU parallelism)

Benchmarking BLAS libraries

Written by Assaad MOAWAD