Snap ML — Speed Up Model Training

Photo by Marc-Olivier Jodoin on Unsplash

Training (machine learning) ML models can take a long time depending on your dataset and available hardware, and that keeps you from experimenting quickly. That can be a problem for any data scientist on a deadline, but at the very least it’s certainly a pain as they have to sit and wait for their results. Snap ML is an exciting library to help address that pain. As a drop-in replacement for scikit-learn it’s particularly easy to use. Snap ML accelerates the training and inference of some of the most popular ML models (State of Data Science and Machine Learning 2020 | Kaggle) and blends in seamlessly with scikit-learn operators for data pre-processing and feature engineering, using a familiar scikit-learn API.

Python is the standardized programming language of choice for many data scientists because of its wide range of libraries and strong support from its vast community of developers. It’s a particularly powerful language for open-source data stacks, but it suffers by design when it comes to fast code execution.

Python’s popularity and ease of use inspired IBM Research to help data scientists in a Python stack by creating Snap ML — which they designed it to optimize for speed and expediency. Snap ML is a free-to-use software library that you can install right now to shorten your training and inference time for your ML models as compared to typical performance from the generally well-loved standard of ML API’s, scikit-learn.

pip install snapml

The below notebook is an example of how to use Snap ML to shorten your training time using a local CPU. We’ll later show other examples of using Snap ML to improve performance at inference time as well as compare the performance of Snap ML in CPU vs GPU.

Random Forest Credit Card Fraud Class

As you can see, the library itself has the same design as scikit-learn intentionally and it fits into the same workflow as scikit-learn by design. It should be easy for data scientists who need to improve their training time (and inference time in later posts) to shorten their model development lifecycle.

The library is distributed for free, but currently not open-source because IBM uses it in our products. It’s a great way to obtain huge increase in productivity and shortened workloads for products like IBM Watson Studio, IBM Watson Machine Learning, IBM Cloud Pak for Data, and our IBM Watson Machine Learning Accelerator. As an example, when you use AutoAI with Watson Studio to automatically generate ML pipelines and Jupyter Notebooks, part of the reason you can execute so quickly is because of how Snap ML is embedded in our products. In an enterprise scenario for building AI solutions, a group of data scientists could schedule and accelerate their ML workloads with Snap ML in a GPU grid with Watson Machine Learning Accelerator.

Snap ML is obviously ready for you to use right now at no cost. You can install it through PyPi and see the productivity spike in the time it takes to train your first model. If you’re curious about how to use it most efficiently, reach out on the IBM Data Science Community, or sign up for a Watson Studio with AutoAI trial, today.

Thanks to Haris Pozidis and Kelvin Lui for their examples notebooks and edits. Thanks to Andreea Anghel for her contributions. Credit to Jana Thompson for edits.

--

--

--

IBM Data Science in Practice is written by data scientists for data scientists to gain hands-on and in-depth learning and to read about inspirational applications and conceptual understanding for challenging topics in the field. Discuss and network: community.ibm.com/datascience

Recommended from Medium

SVM Classification with sklearn.svm.SVC: How To Plot A Decision Boundary With Margins in 2D Space

Yellow tennis balls laying on an orange clay court and separated by a white line. Photo by cottonbro from Pexels.

Ensemble Deep Learning

Predicting Payment Behavior in PAYGo: Machine Learning Can Power Customer Retention

Predicting molecular properties with GPflow

Facial Surface and Texture Synthesis via GAN

Kalman Filter vs Deep Learning for Position Estimation

5 questions to the core developer

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Will Roberts

Will Roberts

More from Medium

How to Organize Data Even Clueless Machines Will Understand | Dataloop

Language, tools, and frameworks to grow from junior/noob to expert Machine Learning Engineer

CRISP DM — Cross Industry Standard Process for Data Mining

Chessmapper: Let’s Talk About That KPIs