Automatic Mixed Precision in TensorFlow for Faster AI Training on NVIDIA GPUs
A guest post by NVIDIA
Mixed precision training utilizes half-precision to speed up training, achieving the same accuracy in some cases as single-precision training using the same hyper-parameters. Memory requirements are also reduced, allowing larger models and minibatches.
Enabling mixed precision involves two steps: porting the model to use the half-precision data type where appropriate; and using loss scaling to preserve small gradient values. We introduce Automatic Mixed Precision feature for TensorFlow (available now in 1.x, and coming soon for 2.x), which makes the modifications for improving training performance with Tensor Cores, available in NVIDIA’s Volta and Turing GPUs. Automatic Mixed Precision applies both of these steps internally in TensorFlow with a single environment variable in NVIDIA’s NGC Container, along with more fine-grained control when necessary.
Enabling this feature for existing TensorFlow model scripts requires setting an environment variable or changing only a few lines of code. Speedups of up to 3X have been observed for the more math-intensive models, amount of speedup achieved depends on model architecture. Today, the Automatic Mixed Precision feature is available inside the TensorFlow container available on NVIDIA NGC container registry.
To enable this feature inside the container, simply set one environment variable:
export TF_ENABLE_AUTO_MIXED_PRECISION=1
As an alternative, the environment variable can be set inside the TensorFlow Python script:
os.environ[‘TF_ENABLE_AUTO_MIXED_PRECISION’] = ‘1’
Once mixed precision is enabled, further speedups can be achieved by:
- Enabling the TensorFlow XLA compiler, although please note that Google still lists XLA as an experimental tool.
- Increasing the minibatch size. Larger minibatches often lead to better GPU utilization, mixed-precision enables up to 2x larger minibatches.
Availability
Automatic Mixed Precision feature is available in the NVIDIA optimized TensorFlow 19.03 NVIDIA NGC Container. We are also working closely with the TensorFlow team at Google to merge this feature directly into the TensorFlow framework core.
You can also find the example training scripts that we used to generate the above performance charts in the NVIDIA NGC model script registry, or on GitHub.
Try the NVIDIA optimized TensorFlow container to get started with automatic mixed precision. Feel free to leave feedback or questions for our team in our TensorFlow forum.
Additional Resources