A $1200 Deep Learning Rig

Inspired by several other system builds ($1000, $1700, and forum posts), I decided to have a go and build one. I was a Sr Director of Data Science for a large travel company at the time and was a bit envious of the work being done by the individual scientists. I was also contemplating a change in employment (with some downtime) — I wanted to ensure I had access to resources to continue my deep learning leveling-up.

Finally, I wanted to make sure I could demonstrate to my kids the Internet’s most important task: distinguishing between “cat” and “not cat”.

Hardware

I used pcpartpicker.com to create the parts list. As of this writing, the total is a hair over $1000, but it appears that the RAM I purchased is no longer available. 16G of 2133MHz RAM is a bit shy of $200; my total cost was just under $1200.

PCPartPicker part list: https://pcpartpicker.com/list/tLjFQV
CPU: Intel — Core i5–7600 3.5GHz Quad-Core Processor 
Motherboard: MSI — Z270I GAMING PRO CARBON AC Mini ITX LGA1151 Motherboard
Memory: PNY — Anarchy 16GB (2 x 8GB) DDR4–2133 Memory
Storage: Crucial — MX300 525GB M.2–2280 Solid State Drive
Video Card: EVGA — GeForce GTX 1070 Ti 8GB SC GAMING ACX 3.0 Black Edition Video Card
Case: Thermaltake — Core V1 Mini ITX Desktop Case
Power Supply: SeaSonic — 520W 80+ Bronze Certified Fully-Modular ATX Power Supply
Case Fan: ARCTIC — F8 PWM 31.0 CFM 80mm Fan
Case Fan: ARCTIC — F8 PWM 31.0 CFM 80mm Fan

For this rig, the key part is the GPU. Budget pushed me into the 1070 Ti, which I’m quite happy with. As this is a hobby rig, I wasn’t too concerned with pushing the ultimate in performance. The Core i5 might be a little light for non-deep learning tasks (exploratory data analysis, Word2Vec or fasttext vector building, etc) but it should be “ok”. Having the M2-based storage is terrific, and I have a NAS for large datasets should that become necessary.

PCPartPicker flags a warning on the GPU and case — the GPU might not fit. I can confirm that it does, but it is tight.

I won’t go through the build steps — it was my first build but I found it rather simple. At the end, I had only one left over cable; process of elimination told me it was the case fan.

The built system

This system was envisioned to be headless (ssh, jupyter notebooks and RStudio Server) so I did not need to price in a keyboard or a monitor.

Software

I went with Ubuntu 16.04 LTS, which was new to me. The install was trivial. I created a boot USB stick, plugged it in and told the BIOS to boot from that thing. From there the OS install went swimmingly. Some minor glitches getting the nVidia drivers and CUDA running (version 8 is key for Tensorflow as of this writing) but got that working fine. Some of the key commands were

sudo apt-get update
sudo apt-get — assume-yes upgrade
sudo apt-get — assume-yes install tmux build-essentials gcc g++ make binutils
sudo apt-get — assume-yes install tmux build-essential gcc g++ make binutils
sudo apt-get — assume-yes install software-properties-common
sudo apt-get — assume-yes install git

Especially that last one…

The USB Stick in question

Installing miniconda came next. Using the bash installer made this a breeze:

bash Miniconda3-latest-Linux-x86_64.sh -b

and then

conda upgrade -y — all
source activate root
sudo apt install python3-pip
pip install tensorflow-gpu
mkdir projects
cd projects/
mkdir github.com
cd github.com/
git clone https://github.com/tensorflow/tensorflow.git
python tensorflow/tensorflow/examples/tutorials/mnist/fully_connected_feed.py

Great, Tensorflow works. On to Keras

pip install keras
git clone https://github.com/fchollet/keras.git
cd keras
cd examples/
python imdb_fasttext.py

That was easy. I took a little detour to install ssh, which was also pretty easy (especially since I know what ssh is). At this point, I can open a ssh tunnel into my “tunnel server” from anywhere that allows me to access port 22 on the Internet and then bounce around my internal network just fine.

“Benchmarking”

I hadn’t left my prior gig, so I had access to some beefy corporate resources and my little rig. I wanted to see how they performed. The beefy resources included whatever AWS had at the time (R3 and R4 instances) and some “terrestrial” cloud servers with Telsa P100s. The window to access these was closing fast so I chose the P100s, figuring I could get to AWS from where ever (or alternately anyone could get to AWS). The benchmark was comparing P100s against my 1070 Ti. And really, benchmark should be “benchmark” because I didn’t try to control for all variables.

Cutting to the chase:

Lower bars are better — they are the means of multiple epochs for each Keras sample.

This was shocking to me. My little box, the first build and the first Ubuntu install I’ve ever done, with a consumer-grade GPU, significantly outperforms a professional-grade server (from a respected system integrator) with a rather impressive GPU. Closer examination of the benchmark software seems to indicate that the P100s did poorest on the most IO intensive training. That seems to indicate that the servers are not using IO all that well. I suspect that my old company has fixed this by now.

To run the code, I essentially did

python mnist_cnn.py 
python mnist_hierarchical_rnn.py
python imdb_bidirectional_lstm.py
python imdb_fasttext.py
python lstm_text_generation.py

and read the epoch timings from the output file. I also noticed that Tensorflow reported

2017–12–11 08:06:01.550986: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: GeForce GTX 1070 Ti, pci bus id: 0000:01:00.0, compute capability: 6.1)

and

2017–12–11 08:06:18.522490: I tensorflow/core/common_runtime/gpu/gpu_device.cc:1120] Creating TensorFlow device (/device:GPU:0) -> (device: 0, name: Tesla P100-PCIE-12GB, pci bus id: 0000:04:00.0, compute capability: 6.0)

I’m not really sure what “compute capability” means (I haven’t bothered to Google it) but I suspect that the 1070 Ti’s “6.1” is 1.7% better than the P100’s “6.0”.

Should anyone want to add to these informal benchmarks, I’d happily accept a pull-request to the timings.csv file out on github.com.

The Internet’s Most Important Task

My kids and I were able to reproduce the Coursera deep learning course’s classification code. We’ll do fast.ai’s shortly.