Building your own Deep Learning dream machine
I’ve been geeking out on Deep Learning lately, taking Andrew Ng’s awesome Deep Learning specialization on Coursera and my friend Lukas Biewald’s awesome ML class. I wanted to build my own Deep Learning desktop so I can train models much faster than on my Mac laptop (or even than on an AWS Deep Learning AMI). With Lukas’ help & tutelage, we made it happen.
In case you’re interested in doing the same, here’s the box we built. Most of the time was spent configuring the software properly. To save you from some of the pain, I’ve tried to excruciatingly detail the steps I took. Most of what I did was Google-based debugging. There are great resources on the Internet on how to remove obstacles. Wherever possible, I tried to link to the original source material I learned from.
Getting the parts
The whole setup revolves around an NVIDIA GPU, the workhorse that drives the machine learning magic. I opted for a high-end 1080Ti which is ~1/2 of the total cost of the buildout below. The speed at which it crunches through models has been totally worth it.
- GPU: EVGA GEForce GTX 1080Ti 11GB. I only got one, though the motherboard supports 2
- Motherboard: ROG STRIX Z370 GAMING (Wi-Fi AC)
- CPU: 8th Gen Intel Core i7–8700 3.2GHz 6C/12T LGA-1151 12MB Cache. Make sure it’s 8th Gen to match the Motherboard
- RAM: Corsair Vengeance LPX 16GB (2x8GB) DDR4 DRAM 3200MHz
- Hard Drive: WD Blue 3D NAND SAA SSD M.2 2280 500GB SSD. Make sure it’s M.2 so it fits in the motherboard
- Case: Fractal Design Mini-C. Mostly make sure it’s Micro-ATX so it fits the motherboard
- Power Supply: Seasonic Focus 750 Gold (SSR-750FM). Make sure it has enough Wattage to drive the machine with the hungry GPUs
Other things you’ll need (which I assume you already have):
- A Monitor, Keyboard, and Mouse. If not, I recommend getting an HDMI-compatible monitor and USB Keyboard and USB Mouse. Any decent ones should do here.
- An Ethernet cable
- A decent-sized (64GB) USB stick you can use to make a bootable Ubuntu
The whole thing, at the Central Computers physical store in San Francisco cost a little over $2200. On the plus side, they installed the CPU, RAM, and HD onto the motherboard, significantly simplifying the installation. It would probably costs a bit less if ordered online (sales taxes) but the peace of mind was worth it.
Setting up the Hardware
Given that the nice folks at Central Computers already took care of the CPU, RAM, HD, all we had to do was:
- Install the motherboard into the case. This was relatively straightforward just a few screws
- Install the power supply into the case. Similarly straightforward once we realized we could remove the case sides
- Install the GPU unit into the motherboard
- Connect the power supply to the fans, motherboard, GPU, etc. This is where it’s really important to read the motherboard manual and follow the instructions precisely. This will save you a lot of heartache.
Overall the hardware bit was the easy part.
Setting up the Software
This turned out to be quite a journey with 4x re-installs of various versions of Ubuntu, wacky tethering to get network, etc. I’ll spare you the story and tell you what worked.
Prepping the OS
Getting internet on the box
Originally, Ubuntu 16.04 won’t recognize the Realtek WiFi chip on the motherboard (we fix this later), so you’ll need to connect the desktop to wired internet using the aforementioned ethernet cable. If you’re not close to a wired ethernet port (I wasn’t), it’s quite fortunate my Mac let me share my wifi over wired Ethernet. This was a great save.
Installing the OS
Boot with the USB stick in and follow the instructions. If all has gone well, you can connect to network and fetch the updates during the install. When it’s done, reboot.
Messing with the video drivers, part 1
Once the OS was installed, the NVIDIA drivers on Ubuntu 16 started causing trouble. In particular I saw an error that said something like:
dev/sda1: clean, 552599/6111232 files, 7119295/24414464 blocks. Fortunately there’s a fix.
Press CTRL-ALT-F1. This will get you into TTY mode. Then uninstall any old NVIDIA drivers:
sudo apt-get remove nvidia-*
sudo apt-get autoremove
Switching to text & setting up SSH
At this stage, I caveat that I never really got lightdm (Ubuntu’s GUI) working. I didn’t really care because the goal was to run a headless machine that I can SSH into from my Mac. So I:
Before installing the NVIDIA drivers, you have to remove the open source Nouveau drivers that Ubuntu installs by default. Fortunately this post details how to do it. Follow Steps 1–3 but ignore the NVIDIA run step as we’ll install the NVIDIA drivers another way (via CUDA below)
Machine Learning Software Stack
I got this machine working with
- CUDA 9.0
- CUDNN 220.127.116.11
- Tensorflow 1.7
Note the latest release as of this writing was CUDA 9.1 and CUDNN 7.1.1 but I couldn’t get Tensorflow to work with these yet.
Installing CUDA 9.0
- Follow NVIDIA’s Pre-Installation Actions to the letter
- Download the Base Installer and follow the installation instructions. One important change in step 4, run instead
sudo apt-get install cuda-9-0to make sure you install version 9.0 and not the latest version (was 9.1 as of this reading)
- Follow NVIDIA’s Post-Installation Actions to the letter. I performed the Recommended Actions (7.2) and the Optional Actions (7.3) as well.
Installing CUDNN 18.104.22.168
- Get yourself an NVIDIA Developer account if you don’t already have one. It’s free
- Download CUDNN. Make sure you get the one that says Download cuDNN v7.0.5 (Dec 5, 2017), for CUDA 9.0 You’ll need 3 files: (1) cuDNN v7.0.5 Runtime Library for Ubuntu16.04 (Deb), (2) cuDNN v7.0.5 Developer Library for Ubuntu16.04 (Deb), (3) cuDNN v7.0.5 Code Samples and User Guide for Ubuntu16.04 (Deb)
- Get those files onto your Deep Learning Box. As a Hack, Dropbox links work with wget so I downloaded the three files on my Mac, moved them to Dropbox, then used wget via SSH to grab them onto the Deep Learning box
- Install CUDNN by following Step 9 & 10 of this doc. Make sure you use the filenames you downloaded.
Install Tensorflow & Keras
You can now install one of the pre-built Tensorflow libraries. Follow the Installing with native pip instructions from Tensorflow. I built for python3. Make sure to install tensorflow-gpu to take advantage of your fancy GPU. Make sure to test it works.
You can also install Keras by running
sudo pip3 install keras
At this point congratulate yourself. You have a (mostly) working Deep Learning box.
Enabling the WiFi chip
If you’re satisfied with a wired Ethernet connection and don’t need to get the WiFi chip working, you can safely skip this section. As I was tethering off my Mac, I was hungry to get the onboard Realtek WiFi chip working.
The following worked to make the OS recognize the chipset:
sudo apt update
sudo apt install git
git clone https://github.com/rtlwifi-linux/rtlwifi-next
sudo make install
sudo modprobe rtl8822be
Connecting to Wifi
Once the chipset was recognized, I needed to connect to my home WiFi network. This set of instructions was great for that. My wifi adapter was wlp4s0. Yours may vary.
Fixing WiFi sleep reconnection errors
I noticed that my Wifi would randomly disconnect. That was no fun. I followed these instructions, noting that my wifi chipset was rtl8822be so that’s the name you have to use for the rest of the instructions.
Advanced: Install Tensorflow from Source
If it bothers you that Tensorflow’s pre-built libraries doesn’t take advantage of CPU optimizations like FMA and AVX2, you can build Tensorflow from source to get all the optimization goodies. To do this:
- Grab Tensorflow sources. I used version r1.7,
git checkout r1.7
- Follow Step 11 from this doc, but make sure to include the CPU optimizations you need in the build script. They’re of the form
--copt=-m<optso for me, to build with AVX2 and FMA, my build script looked like
bazel build --config=opt --copt=-mavx2 --=-mfma --config=cuda --incompatible_load_argument_is_label=false //tensorflow/tools/pip_package:build_pip_package
Make sure to first remove the Tensorflow you installed from prebuild libraries
sudo pip3 uninstall tensorflow-gpu and test that Tensorflow works after you’ve build it.
Building your own Deep Learning box can be frustrating during the process, but the fun of seeing models blast through it once you have it running will be well worth it. If you run into bugs, the Internet likely has an answer for every nasty OS, CUDA, CUDNN, etc question. Keep plowing towards Deep Learning goodness.
There are many great guides to build great Deep Learning boxes, including Lukas’ O’Reilly write-up from last year. This is my own. Feel free to follow your own sherpa or make your own path.
Thank you to Lukas who was a great guide:
If you like this, or have questions when building your own Deep Learning box, reach out on Twitter.