Introduction to ThunderSVM: A Fast SVM Library on GPUs and CPUs

Published in

Analytics Vidhya

4 min readAug 23, 2020

And how to run it on Google Colab …

Photo by ThisisEngineering RAEng on Unsplash

Support Vector Machine(SVM) is a machine learning algorithm used for classification and regression. It is one of the most basic ML algorithms which can be used to classify points in high dimensional space and to predict real numbers from a set of given features. If you want to know more in-depth about SVM, feel free to read the following:

SVM in Classification

SVM in Regression

In this article, we will get to see how to install and run ThunderSVM in Google Colab. Let’s see what thundersvm is first…

ThunderSVM

ThunderSVM is an open-source library which leverages GPUs and multi-core CPUs in applying SVM to solve problems in a much faster way with high efficiency. The speedup increase using thundersvm compared to scikit-learn svm in classification increases proportionally with the amount of data. By changing just one line of code, you can speed up your algorithm by about 70 times!

In using thundersvm, first you have to install it following the steps below.

Installation

To fully utilize both cpu and gpu, use the gpu runtime in google colab. First, open colab and click Runtime >> Change runtime type from the Menubar. Next, click GPU in the Hardware Accelerator dropdown menu. This will activate a gpu backend for your google colab script.

CUDA Toolkit 9.0 is required to run thundersvm. We’ll install this first.

Now, execute the following lines of code one after another in the order shown below:

1. !wget https://developer.nvidia.com/compute/cuda/9.0/Prod/local_installers/cuda-repo-ubuntu1704-9-0-local_9.0.176-1_amd64-deb2. !ls  # Check if required cuda 9.0 amd64-deb file is downloaded3. !dpkg -i cuda-repo-ubuntu1704-9-0-local_9.0.176-1_amd64-deb4. !ls /var/cuda-repo-9-0-local | grep .pub5. !apt-key add /var/cuda-repo-9-0-local/7fa2af80.pub6. !apt-get update7. !sudo apt-get install cuda-9.0

Next, run the following command to check if CUDA 9.0 is successfully installed:

!nvcc --version

After running this, if you get to see Cuda compilation tools, release 9.0 followed by specific version number in the last line of output, then installation is done. If you want to install it in your local computer, just run the above commands sequentially in terminal by removing the “!” for each of them. If you want to check the output results, the code for installation is available here.

Next, we’ll install thundersvm.

Install ThunderSVM

Run the following command to install it in google colab.

!pip install thundersvm

If you only want to use the cpu without the gpu in executing thundersvm, execute the following command in colab:

!pip install thundersvm-cpu

You can also run the above commands in the terminal of your computer removing the “!” to install in your PC.

For the cpu only version, you don’t need to install CUDA 9.0. Thundersvm will use all cores of cpu in both cases. The following steps are identical for both the cpu and gpu versions.

Execution

Let us run support vector classification with thundersvm in the following way:

As we can see, apart from changing the import statement, everything else is similar to running scikit-learn svm.

Speed-Up Evaluation

To see the comparison of increase in speed between scikit-learn svm and thundersvm, let’s look at the code below.

We can already see more than 50x speedup for fitting on training data. Both prediction and scoring functions also become more than 17x faster. If we continue to increase the amount of data, the speedup increases proportionally for all the three above functions.

Doubling the amount of data from 50,000 to 100,000 increased the speedup for Fit function from 20x to 70x. By the time we crossed 200,000 samples, thundersvm results in more than 120x speedup in training speed! Both Score and Predict functions also resulted in more than 20x speedup in execution time. During my use in google colab, I used the Tesla-P100 gpu. Individual rate of speedup increase may vary depending on the type of gpu.

The code for comparison of speedup and visualization can be found here.

Conclusion

SVM can be really slow to run on large amounts of data. But, by using thundersvm, you can easily speed up every aspect of your code. You can use it in Linux, Windows and MacOS. Thundersvm also supports all the functionalities of LibSVM including Support Vector Regression(SVR), one-class SVMs and probabilistic SVMs. You can check it out here to learn more.

I hope this helps you a bit in your coding journey. All the Best! 😀

Thanks for reading! If you liked this post or have any questions/comments, please leave a comment below!

The code for this post can be found on my GitHub page.