How to compute gradients in Tensorflow and Pytorch

Published in

CodeX

4 min readApr 9, 2021

Computing gradients is one of core parts in many machine learning algorithms. Fortunately, we have deep learning frameworks handle for us. This post will explain how Tensorflow and Pytorch can help us to compute gradient with an example.

Many of us has been familiar with training neural networks using TensorFlow and PyTorch. We already know how to compute gradients and use optimizers to update the weight parameters with some lines of code. This post is about to separate the part of computing gradients in these libraries to see what happens behind the code.

1. Derivatives and Gradients

In 1-dimension, the derivative of a function is defined as follows:

Equation 1. Limit definition of derivative

Generally, in multiple dimensions, gradient is the vector of partial derivatives along each dimension. Hence, a gradient has the same shape as x. And each element of the gradient will tell us what is the slope of the function f if we move in that coordinate direction.

The gradient turns out to have nice properties. It points in the direction of the greatest increase of the function. Correspondingly, the negative of gradient give us the direction of greatest decrease of the function.

2. How to evaluate gradients

One naive way to evaluate gradients in a computer is using method of finite differences, which uses the limit definition of gradient (Eq 1). Concretely, we iteratively evaluate the equation 1 for each dimension of x with a small value of h for that dimension. And it can be very slow when the size of x is large.

But thankfully, we do not have to do that. We can use calculus to compute an analytic gradient, i.e. to write down an expression for what the gradient should be.

In summary, there are 2 ways to compute gradients.

Numerical gradients: approximate, slow, easy to write.
Analytic gradients: exact, fast, error-prone.

In practice, we should always use analytic gradients, but check implementation with numerical gradients. This is called gradient check.

3. An example for illustration

Now, let jump to an example (from Coursera course in References). In this example, we will have some computations and use chain rule to compute gradient ourselves. We then see how PyTorch and Tensorflow can compute gradient for us.

4. PyTorch code

Implementing the code in PyTorch will give us exactly what we expect for the example above.

torch.autograd is PyTorch’s automatic differentiation engine that helps us to compute gradients.

We first create a tensor x with requires_grad=True. This signals to autograd that every operation on it should be tracked. When we call .backward() on z, autograd calculates these gradients and stores them in the tensor’s .gradattribute. Hence, we can see the gradients in x.grad.

5. Tensorflow code

In TensorFlow, optimizers are implemented using TensorFlow automatic differentiation API call Gradient Tape. This API lets us compute and track the gradient of every differentiable TensorFlow operation.

Operations within a gradient tape scope are recorded if at least one of their variables is watched. If we watch the variable x, the tape will watch the rest of the operations that you can see below. When we call tape.gradient to compute the gradients of z with respect to x, we have the same results as before.

Conclusions

This post provided a simple example about how to compute gradients using PyTorch’s autograd and TensorFlow’s Gradient Tape. We actually use them for more complicated functions and for training deep neural networks.

The secret of differential mechanism (as I know) is from graph computation. However, the graph computation, backpropagation or the explicit implementation of these frameworks are beyond the scope of this post. You can look for more explanations in the References or their source code.

References

Standford CS231n Lecture 3 Loss Functions and Optimization: https://youtu.be/h7iBpEHGVNc
Standford CS231n Lecture 4 Introduction to Neural Networks: https://youtu.be/d14TUNcbn1k
PyTorch Autograd tutorial: https://pytorch.org/tutorials/beginner/blitz/autograd_tutorial.html
Coursera Custom and Distributed Training with TensorFlow: https://www.coursera.org/learn/custom-distributed-training-with-tensorflow?specialization=tensorflow-advanced-techniques