TensorFlow Installation with RTX GPU Support
I recently picked up an Nvidia RTX2070 with the intent to do some Tensorflow development on it. Unfortunately, the guides from the Tensorflow website led me astray as they are currently stuck on CUDA 9.0 and the RTX / Turing cards do not support this version. Tensorflow won’t let you know that — it just crashes when trying to utilize Cuda-accelerated features.
It turns out that what we need is a CUDA 10.0-enabled version of Tensorflow. This is on the roadmap for the near future, but you’ll need to do some trickery to get it to work now (as in, 11/2018–1/2019). I’m going to write out the steps I followed to get it to work on Linux Mint Tara. These instructions should work equally well on Ubuntu 16.04 or 18.04 with some mild modifications.
Step 1: Install nVidia Software
The first part of the process involves installing the nVidia drivers , CUDA 10.0 and the associated Nvidia deep learning libraries that Tensorflow uses.
When it comes to getting Nvidia software, I recommend you get it from Nvidia’s site. I will not write out a long guide on installing it all, instead I suggest you take a look at this guide:
Note: I just wrote a post on installing CUDA 9.2 and cuDNN 7.1 here.medium.com
The trick is that you will need to follow Zhanwen’s guide but substitute in CUDA 10.0 software and the latest (410+) Nvidia drivers for his suggestions. Otherwise, that guide still works great.
In addition to Zhanwen’s guide, you’ll also want to install cudnn and nccl for CUDA 10:
The NVIDIA CUDA® Deep Neural Network library (cuDNN) is a GPU-accelerated library of primitives for deep neural…developer.nvidia.com
I had an issue installing these libraries via the .deb files Nvidia distributes. I could not find the actual *.so files needed by Tensorflow. The solution (for me) was to install the libraries manually from the tarballs that Nvidia has on their sites. Nvidia provides installation instructions for the above links which worked for me.
Step 2: Build Tensorflow
The production Tensorflow images do not support CUDA 10, but you can build your own version of Tensorflow that does. This process is fairly well documented here:
With a few caveats. First of all, Bazel 1.8 will not work with Tensorflow, you need to install Bazel 1.5. You’ll need to point Bazel to the proper paths for cuDNN and NCCL from above as well. You can also leave this out of the bazel build command:
It gets specified automatically during the configure process. This build takes awhile. I recommend you add a “-j4” to your bazel to use 4 threads (or more, if you have a better CPU).
Once you’ve built Tensorflow from the commands found in the above github link by tfboyd, install the wheel using pip.
Step 3: Test It
With your custom CUDA 10-enabled Tensorflow installed, it’s time to test it to see what you can do with it. I did this by running the CNN tutorial/sample packaged with the Tensorflow models repo:
This setup requires that all GPUs share the model parameters. A well-known fact is that transferring data to and from…www.tensorflow.org
Here’s the repo:
Models and examples built with TensorFlow. Contribute to tensorflow/models development by creating an account on…github.com
TLDR for those who don’t want to read the first page:
git clone https://github.com/tensorflow/models.git
The script will automatically download the CIFAR10 dataset, which may take a couple of minutes. After that, it’ll load up a model and train it — hopefully using your GPU!
You can watch your GPU’s usage figures (load/memory usage/temperatures) using:
watch -n 1 nvidia-smi