TensorFlow (GPU) Setup for Developers

Create Resumes at CVGist
HackerNoon.com
7 min readOct 31, 2017

--

tensorflow, machine learning, gpu, setup-guide

Intro

…For devs wanting to run some cool models or experiments with TensorFlow (on GPU for more intense training). This probably isn’t for the professional data scientists or anyone creating actual models — I imagine their setups are a bit more verbose.

This blog post will cover my manual implementation of setting up/building TensorFlow with GPU support. I’ve spent decent time reading posts and going through walkthroughs… and learned a ton from them… so I pieced together this installation guide to which I’ve been routinely using since (should have a CloudFormation script soon). There are obviously simple installs via pip/conda, but we want to manually build out TF and get it running on GPU… still this installation guide is for simple/default configurations and settings.

The Core (we will cover these as we go along):

  • AWS EC2 p2.xlarge
  • Ubuntu 16.04
  • CUDA 8.0 (along with CUDDN)
  • TensorFlow 1.0
  • Python 2.x and libs (libs not directly needed here, but I base my installations around Python usage)

Launch your AWS EC2 Instance

  1. Choose OS type: Ubuntu Server 16.04
  2. Choose Instance Type: GPU Compute > p2.xlarge

Why p2? They were made specifically for what we want to do, which is to run intense computations on the GPU. p2.xlarge offers us 12gb GPU. We will be able to do (batched) computationally-intensive learning just fine on this machine

WARNING: p2 instances, while running, are $0.90/hour — So I suggest you only keep the instance state running while you are performing tasks such as training.

3) Configuration: Basic configurations are fine, you will need to use a VPC (default VPC and subnet settings are fine as well)

4) Storage: Note: when adding storage, account for the amount of data you’ll most likely be training on. For example, the COCO 2014 dataset (images) is around 15GB, and the trained neural net most commonly used for COCO is .5GB. Also, we will be downloading 100s of MBs of individual software and libraries. I’ve opted for 50GB storage in my work, and have been using around 30–40GB of that.

5) Continue on, and setup with your preferences or the defaults. There is no other special settings that are needed to proceed, so launch and wait for the instance to be started.

Launch the instance, wait for it to be running, and we should be good to move on.

Installation Guide

SSH into your newly launched instance.

Ubuntu updates

Note: If you get the response:

“new version of boot/grub/menu.lst …”

— I keep the local version in my testing environments. For production use, due diligence is required.

Dependencies

useful tools we will need:

this is also where I install the python libraries I will use (not required for tensorflow install):

other system libraries i’ve found to be required when building and using tensorflow:

now create a directory for us to work out of:

Building TensorFlow

We are going to be building TensorFlow from source. There is a simple pip installation, but we will get better performance — in some use cases — by building

https://www.tensorflow.org/performance/performance_guide#build_and_install_from_source

Bazel is TenorFlow’s build tool. Bazel needs to use our ip and because we are in a VPC, we need to change the hosts file.

To get your private ip-address, run ifconfig in the terminal… you’ll get an output, so look for inet adds and note that ip — (also available in the AWS Console)

  1. modify /etc/hosts:

2. install Java (needed for Bazel):

3. install Bazel (0.5.1):

Bazel will now be installed. Let’s add it in .bashr

modify ~/.bashrc (add the 2 lines to the end of the file):

let’s load .bashrc into our shell now:

We now have Bazel.

Ok, before we get TensorFlow and build, we need to install our dependencies for GPU-support. These include CUDA and CUDNN.
For more info on these: https://en.wikipedia.org/wiki/CUDA

install CUDA:

To test that we have this installed correctly, run a simple command:

Note: For CUDNN, you will need a Nvidia developer account. You can get this here: https://developer.nvidia.com/rdp/cudnn-download

After you have setup your account, go ahead and download the 8.0-linux-x64-v5.1 tarball to your own machine.

We will transfer this file (and all others coming from our machine) to our EC2 instance using scp.

Note: Before you scp the cudnn tarball, edit the command below with your ec2 and correct paths

run scp command from your own machine (replace ip):

After the tarball transfers, unzip and place:

TENSOR TIME

Yea.. cool. Let’s go and get TensorFlow. Using v1 here.

This will launch the configuration script.

Note: You will need to configure one setting as non-default: CUDA usage. So [press enter] to get all default settings, until you see the question asking if you want to support CUDA. YES we do. The following CUDA settings can be default as well.

Ok, TensorFlow is configured and ready to build. The build process is timely. Usually lasting around 1-hour on p2.xlarge. So to combat this, and not leave your process bound to the open shell, we will use screen. Screen is a tool that will allow us to launch a new window in the shell, start a process, and then detach from that window… and eventually allow us to come back/re-attach to that window/process. See more here: https://www.rackaid.com/blog/linux-screen-tutorial-and-how-to/

(screen also comes in handy when training your models and running those respective processes)

launch a new screen window:

We should still be in the install/tensorflow directory

build TensorFlow with Bazel:

Note: This is a long process, expect this to take an hour. Thanks to screen, we can detach from this window (close the ssh connection if you’d like) and take a break

To detach from the screen window where Bazel is building:

… time period of tensorflow building …

After this break, reattach to the screen window that is building/built TensorFlow:

install the new build using pip:

We want to now make sure TensorFlow can find cuda before running. We need to add it into our environment.

modify ~/.bashrc

cd out to the home directory (make sure you are out of tensflow directory)

and

TENSORFLOW IS INSTALLED!

let’s test this in interactive Python:

and to confirm we are using GPU, look for this output:

Now go do some cool s***.

follow me

--

--