Build a Machine Learning Dev Box with Nvidia Titan X Pascal

It’ been a long while since I built a desktop for a home entertainment system. I thought I won’t using another PC since I have moved to Mac machines. Well, things keep changing and they repeat themselves in different flavors and levels. The PC is back again for me.

I have been helping a machine learning startup company to grow my career path. As a software engineer, machine learning is the next field to pursue. In order to save time on developing the algorithms and training the models, people are starting to use GPUs. The company has the new Titan X, so the CEO and me headed to Fry’s and started a litter journey to build the box.

I was a bit nervous. The hardware specs have been improved a lot so I needed to catch up. LGA2011, SATA III, DDR4 , PCI express gen3 x 16, M2, X99, SSD, … Oh my. Spent some time researching and refreshing my memory on the internet and found some helpful links.

Since we want to maximise the efficiency on a PC, we planed the spec to have two GPUs, one can occupy 16 PCI lanes. Therefore, 40 PCI lanes are essential, which nail it down to the X99 platforms. The platform cost a lot more but has room to grow to increase the computing power and speed. You would think how time can you really save to have additional performance? Trust me it will save more and more time down the road when data is growing and the model networks are deeper and deeper.

After carefully reading the manuals several times, and turning the figures upside down again and again, made sure we were anti-static, and no fingers on the chips, we finally put together the box. The moment of the truth had come, we were ready to turn on the power. Wait, did you check this, that, and that? Yes.

“Did you turn on the power …. ?” “ Yeah. Let me turn it off and on again” Then why the fan goes up and down right away? There was no POST on the screen. Is the power connected and secured? Did we put the CPU on the right orientation? Let’s try to swap the memory. Let’s pull a component out one at a time and see…

“What time is Fry’s closing?” It’s still open, let’s go. Thanks to great service from the manager Michael. He took the machine in and started to play with it. It took him a while, too. I was kind of relieved and said, “Goodness it’s not our fault, we didn’t break anything. we are as good as him” Finally it turned out that the motherboard needs to update the BIOS for the newer CPU. The manager fixed that for free. That’s a great service.

I took the machine home to install the OS and software. People is using Ubuntu 14 or 16 mostly for the machine learning projects. I installed Ubuntu 16 for the host OS and used Docker for the Ubuntu 14 applications. The tough part was the drivers for the GPU. Since it’s on the bleeding edge, special attentions and instructions were required. Thanks to the internet again, a lot of useful information, after manual filtering, are out there. It’s still took me two full day to install them correctly. To take advantage of the latest Pascal architecture, I installed CUDA 8.0. Gradually I installed GPU version of tools like TensorFlow, OpenCV, etc.

It’s really a great advantage to have a powerfull GPU for intensive computation, especially for deep neural network training like CNN. I have been using Titan X for my projects in the Self-Driving Car Nanodegree from Udacity I am currently attending. I can run a two layer CNN training on traffic sign dataset for just 20 seconds or few minutes, instead of 25 minutes or few hours on my Macbook Pro. I usually run a lot of training to fine tune the parameters, debug the code, and add some new ideas. It’s easier save me days and weeks of time for a single project. I am glad I have built a great tool for my tasks.