How to use half precision float16 when training on RTX cards with Tensorflow / Keras

If you are doing machine learning on NVidia’s new RTX cards, you will want to try out half precision floats (float16). The new NVidia RTX 2070 cards have less physical memory than the old GTX 1080ti cards but the RTX’s newer architecture supports float16. This effectively doubles the size of the models you can train using RTX cards because half precision floats take up half the memory space of the default float32 floats that are typically used in machine learning.

Now, as a sweet summer child, you will be thinking that it’s really simple to configure your setup so you can do FP16 training with your shiny new RTX cards using Tensorflow and Keras, right?

You may think that you just physically install your cards, find some drivers from somewhere and tell TF / Keras to use float16.

If you have come here, you probably know that this isn’t true…

The main problem

RTX cards require CUDA 10 and as of January 2019, Tensorflow only supports CUDA 9.

The solution

You need to install CUDA 10, then build your own version of Tensorflow against CUDA 10.

It turns even this isn’t enough, and you need 2 lines of code to get FP16 working after this:

Firstly the instruction to use float16. Secondly to adjust the ‘epsilon’ to a larger value because the default value is too small for FP16 calculations. If you don’t change the epsilon, you will often get NaN during training.

How to build Tensorflow from source

  1. Install graphic card drivers etc. Make sure these are working with nvidia-smi
  2. Install CUDA 10
  3. git clone Tensorflow source code.
  4. I tried two versions of Tensforflow, 1.11 and 1.12. Both worked for me. Just git checkout the branch containing the version of Tensorflow you are targetting.
  5. You can try to follow the instructions for building Tensorflow from source but be aware that you need to install Bazel version 0.18.0 (I couldn’t get it working with later versions). You will need to install some python module pre-requisites via pip for the build to be successful. I used all the default options under ./configure and make sure you choose CUDA version 10.0 and point the build at your CUDA 10 binaries.