I remember the first time I ran a deep learning model on a powerful GPU (an NVIDIA GTX 1080). The model zipped through each training epoch so fast, I felt like I had just switched from driving a sedan to riding in a sports car. 🚙
The training speed was exhilarating; experimenting with different models went a lot faster than normal. But since that project, accelerated deep learning has been a rare luxury. Compute time on a good GPU can be expensive. Our datasets or models are usually small enough that while training time can be slow, it’s not so slow that it would justify the cost of cloud compute / building a custom machine.
So I’ve been grinding along as it is, paying-as-I-go for Paperspace compute time when GPU acceleration is really needed while hoarding cloud credits for some point in the distant future when I can splurge it all on a P100.
Essentially, PlaidML makes it faster to run deep learning on a laptop / embedded device / or other computing hardware that have traditionally not been compatible for deep learning workloads.
In plain English, if you have a Mac / Windows / Linux laptop, or even a Raspberry Pi, you can install PlaidML and train a deep learning model using your device’s GPU.
When I came across this tweet, it sounded amazing, so I decided to research and write about PlaidML — what it is, how it works, and how to get started, using my 2017 Macbook Pro as an example.
What is PlaidML?
PlaidML is an open-source tensor compiler that can accelerate the process of training deep learning models and getting predictions from those models.
What is a tensor compiler? While we’re on that subject, what is a compiler anyway?
Compilers are computer programs that convert higher-level instructions to lower-level machine-code so those instructions can be read and executed by a computer.
Within this context, tensor compilers bridge the gap between tensor operations used in deep learning (convolutions etc.) and the platform and chip specific code needed to perform those operations with good performance.
How does PlaidML work?
To perform this translation from high-level tensor operations to low-level machine code, PlaidML uses its Tile language to “generate precisely tailored OpenCL, OpenGL, LLVM or CUDA code on the fly” so this code can be run on an OpenCL / OpenGL / LLVM / CUDA-compatible device. The people (Intel AI) who released PlaidML wrote a blog post that explains in more detail how all this works. See here.
Getting started with PlaidML
(most of what follows is taken from the Quick Start section of PlaidML’s github page and has been adapted for running on a 2017 Macbook Pro.
Step 1: check which graphics card your computer has.
My 2017 Macbook Pro has an Intel HD Graphics 630 and a Radeon Pro 560. Both use OpenCL so they’re compatible with PlaidML. (full list of OpenCL-compatible Mac computers. Also, how to get started for other Operating Systems like Windows and Linux)
Step 2: install PlaidML (with judicious use of virtual environments)
pip install plaidml-keras plaidbench
Step 3: Setup PlaidML
Step 4: Choose your accelerator
Here I’m going to go with the Intel graphics card.
Step 5: Whew, that worked! Now to save settings
Step 6: Run benchmarks
PlaidML comes with a command-line tool
plaidbench for benchmarking the performance of different cards across different frameworks.
Here we can run a mobilenet inference benchmark with one line on both the Radeon and Intel graphics cards to compare their performance.
plaidbench keras mobilenet
Looks like both cards deliver the same inference performance. Good to know!