Image for post
Image for post

How to install Tensorflow + CUDA 9.1 into Ubuntu 18.04

André Mello
May 6, 2018 · 10 min read

If you're like me, you're always itching to try out the latest versions of whatever software you use. Sadly, that often means figuring out how to iron out a few kinks to get things to mix and match. Here's a guide to help you prepare your shiny new Ubuntu for deep learning.

The audience

This guide is for those who need to get the most of their Nvidia-based hardware. I assume you have a fresh install of Ubuntu 18.04. If you are just playing around and won't need to use your GPU, you're probably better off installing the pip package instead. If you are an AMD person, instead of CUDA you need to use SYCL (OpenCL), but I can't help much with that.

Installing CUDA: good news, bad news

So, the first thing you should try is to install the tensorflow-gpu pip package, as described in the official install guide. That has never worked for me, though, so I'll only show how to install from source. This has the added advantage that the compiled binaries will take advantage of all optimizations your machine supports.

If you follow the official guide to installing Tensorflow from source, you'll notice they recommend using CUDA 9.0. Now, if your Ubuntu was 17.04 — the latest officially supported version — then you should indeed stick with the 9.0 version from the archives. In that case the prebuilt pip package will likely work. But since your OS is not supported anyway, there's no point in not using the latest CUDA, which right now is 9.1.

The good news is that Ubuntu 18.04 has added CUDA to its multiverse repository. That means you don't need to mess around with adding third-party repositories and all the inevitable version clashes that come with it. You can simply install everything using apt:

This will install the (currently) latest graphics driver, CUDA itself, CUPTI, which for some reason doesn't ship with the CUDA package as it should, GCC version 6, which is the latest compatible for compiling CUDA code, and a bunch of Python 3 development essentials. If, for some weird reason, you need to use Python 2, just omit the 3 in python3.

The bad news is that, because it has become a native package, CUDA is installed in a rather non-standard way.

I can’t blame either Nvidia or Canonical. The Nvidia way was the best, most self-contained way, to keep consistency across different distros. It made it easy for third-party dependencies to look for CUDA, and to folks with non-standard distros to retain compatibility. Everything was put under the /usr/local/cuda-*.* path, so it was easy to maintain versions without relying on a package manager.

But that’s not how native packages work; because they can rely on the package manager’s control, they are installed into the root system paths: /usr/bin, /usr/include,/usr/lib. Imagine if each new package would append to the PATH and LD_LIBRARY_PATH environment variables. Looking for binaries, headers and libraries would soon become slow and prone to obscure namespace issues. This standard makes it easy for all tools to know where to look for their dependencies, but is only viable because the package manager tracks which packages installed which files. It’s a human-friendliness/scalability tradeoff.

Tensorflow will probably eventually update its configuration tool to work with this new installation format, but meanwhile we need to emulate the old way for it to work. The following commands should do the trick:

Note: this guide assumes you’ve got a 64 bit system (very likely).

Why not do it the traditional way?

You might be wondering, why stick with the multiverse package when the Nvidia-provided one is so much easier to deal with? Well, the main reason is that the Nvidia package simply does not work with the new Ubuntu, because of version clashes with some packages, in particular the graphics driver. When I tried to it that way I ended up with a broken Ubuntu and had to reinstall it from scratch. Despite the additional work it requires, though, the multiverse package is actually quite up-to-date and should cause less headaches long-term, since it doesn’t rely on Nvidia’s care as much.

Installing additional Nvidia libraries

Tensorflow also depends on cuDNN and NCCL, both of which you can download from the Nvidia website. I’ve had success using the (currently) latest versions, 7.1 and 2.1.15.

To install cuDNN, simply copy the files over to the /usr/local/cuda directory you created. Assuming you’ve extracted the .tgz into your Downloads folder:

EDIT: As per Ian Jason Min’s comment, I’ve updated this segment, which actually wasn’t correct because symbolic links don’t mix up with actual directories. Sorry about that. On the plus side, I took this opportunity to make some files into symbolic links, which saves about 600MB of space (this also avoids a warning with apt).

This will actually copy them over to the root system paths, which is not ideal because they won’t be tracked by any package manager, but they’re just a few self-contained files, so we can live with that. If you have a more robust procedure in mind, feel free to comment.

To install NCCL, you need a little more work:

EDIT: Once again Ian Jason Min saved the day and pointed out a couple of missing details in the above segment.

This will also install into the root paths, but, again, shouldn’t be a big deal. I’ll show a few commands to undo all of this later on.

One last thing I should comment is that Tensorflow can also use TensorRT to speed up inference, but I couldn’t make it work with this setup. The configuration tool complains about some version incompatibility I couldn’t resolve. If someone figures it out, I’ll update this section.

Installing Bazel

The official guide recommends installing Bazel with the binary installer, but I actually think the custom repository is easier and better — it will keep things updated. These instructions are easy enough to follow, but I’ll just copy them here for your convenience:

One more thing

The configuration script assumes there’s a python binary in your environment. By default, Ubuntu 18.04 does not come with Python 2 anymore, but the Python 3 binary is called python3. To resolve this issue, I like to use update-alternatives:

This way whenever you call python you get Python 3. 😊

Note that if you ever install Python 2, python will continue to point to Python 3. Python 2 will be accessible via python2.

Installing Tensorflow

Now, we can finally move on to the good stuff.

First, clone the Tensorflow repository:

However, unlike what’s recommended in the official guide, you should stick with the master branch. The latest release (right now) is 1.8, and it has a bug that prevents some code from compiling with GCC 6. Apparently the official build is compiled with GCC 4.8, which is why they made an apparently broken release.

Anyway, the fix has been merged into master, so it should compile fine. In case you run into issues, I built it at commit #d0f5bc1 (there have been a lot of newer commits already, some of which may break something).

The next step is to run the configuration tool with ./configure. Here’s my inputs:

Noteworthy points:

  • Python path: /usr/bin/python3
  • GCC path: /usr/bin/gcc-6
  • NCCL path: /usr/local/cuda/nccl
  • Check you latest CUDA compute capability.

NOTE: It has been reported that the newer commits require Keras to compile. Although this looks like a screwup by some dev, for now it’s best to avoid the issue and install Keras first:

Now, to compile, just run

Note: if you get an error like "C++ compilation of rule ‘@double_conversion//:double-conversion’ failed" , then it might be useful to pass the additional argument of --cxxopt="-D_GLIBCXX_USE_CXX11_ABI=0".

This step will probably take a long time. After it finishes, if all goes well, you can build the tensorflow package with

and to install

Check if your build is working by changing into another directory (cd) and running python:

You should get Hello, Tensorflow! as an output.

How to undo this

As much as you may have felt spooked by some commands, there’s not much damage being done to your system if you follow this guide. If you run into issues and wish to undo everything CUDA-related so you can restart from scratch or try something else, just run the following lines:

You may wish to omit the nvidia-390 bit, since it’s usually a good idea to have the proprietary driver whether you are using CUDA or not.

To uninstall the Tensorflow package, use pip:

Closing remarks

If all went well, you now have an extremely optimized cutting-edge build of Tensorflow installed in your Ubuntu 18.04. Only thing that could make it faster is adding TensorRT, but I could not figure how to make it work with this setup. Feel free to make any suggestions or ask for help in the comments. I hope my guide will save some time for a few people (it took me the better part of a day to figure all of this out).

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store