Automatic differentiation in TensorFlow — a practical example

Slawomir Telega, PhD
6 min readJan 9, 2024

--

It might by assumed that practically every Artificial Neural Network (ANN) uses gradient operations in the training process — call it BackPropoagation, BackPropagationThroughTime, or anything else, it’s still a form of derivative. Of course, one may argue that derivative-free optimization is a thing, but the application is, to the best of my knowledge, limited to small size problems. Due to that fact all mainstream libraries, like TensorFlow or PyTorch include a form of automatic differentiation engine — in the very case GradientTape() and AutoGrad, respectively. The main idea is to have the possibility of tracking the operations performed on a given variables and retain the possibility of calculating the partial derivatives with respect to them. Such an engine brings down the BackProp to a simple code— what makes it quite easy to experiment with own custom network architectures without drowning in the chaos created by the chain rule differenetiation somewhere in the midst of a deep network.

In TensorFlow, as mentioned earlier, such an engine is GradientTape(). The usage is simple and straightforward, once one grips the idea behind it. Roughly speaking ;), I’ve seen two approaches to explaning it — either by showing how it calculates the first derivative of x² for the given value of x, or by going all the way with the process of creating a neural network from the scratch. While the second way is both informative and interesting, I’d like to place myself somewhere in between the abovementioned two. That said, I strongly encourage everybody interested to check the TensorFlow API description, which contains all the knowledge on the subject you may ever need. The tutorial below is just a try to put the idea into a context which some people may find easier to understand — i.e. a physical ball traversing space over an equally physical field.

I’ve decided to use a simple projectile motion — just a high school math. For the sake of the present discussion let’s say it’s a baseball traversing the space. Simplifing the physics, let’s assume there is no friction and all of the ball is concentrated into a single point. With that in mind, we can, following Gallileo’s idea, separate the motion in two dimentions: horizontal (x) and vertical (y), independent from each other, forming together a ballistic curve. If one launches the projectile with the initial speed v_0 at the angle alpha with respect to the ground, the equations describing displacements are (g stands for the acceleration due to gravity):

x(t) = v_0 * t * cos(alpha)
y(t) = v_0 * t * sin(alpha) - (g * t**2)/2

As it may be seen in the figure above, both dispalcements start at zero, then, due to a steep angle (60 deg), y grows faster than x in the initial phase. Below the angle of 45 degrees horizontal displacement grows faster in the whole considered time regime.

Having the above, one can easily calcualte both velocity and acceleration in horizontal (booooooring — constant speed, zero acceleration) and vertical movement by hand. We can also calculate time of flight and the range in a straightforward way, obtaining:

range = v_0**2 * sin(2*alpha)/g
TOF = 2 * v_0 * sin(alpha) / g

But, as the purpose of this short tutorial is to show the usage of the GradientTape, we may use it to calcualte both velocity and acceleration of the veritical movement numerically (all the data is measured in the SI units — that is displacement in meters, time in seconds, velocity in meters per second and acceleration in meters per squared second).

# imports
import numpy as np
import tensorflow as tf

# constants
v_0 = 10
alpha = 60
g = 9.81

# convert angle from degrees to radians
angle = 2*np.pi * alpha / 360.

# calculate range and time of flight
r = v_0**2 *np.sin(2*angle) / g
tof = 2 * v_0 * np.sin(angle) / g

# create the list holding time steps from 0 to tof
tt = [tof * t/10. for t in range(0, 11)]

Now, for the magic to happen, one has to convert time list into TensorFlow Variable (it is automatically watched with GradientTape — on other types of data ‘watching’ can be manually invoked).

time = tf.Variable(tt)

And now, all the operations performed within the invoked GradientTape’s range are being watched and the gradients with respect to the watched variable may be performed, as shown below:

# invoke GradientTape()
with tf.GradientTape(persistent=True) as tape:
# calculate y as a functions of (watched) time
x = v_0 * time * math.cos(angle)
y = v_0 * time * math.sin(angle)- g * time**2 /2.
# calculate the gradients (velocity)
vx = tape.gradient(x, time)
vy = tape.gradient(y, time)

It may be easily observed that, in accordance with the laws of motion, horizontal velocity is constant, while vertical changes lineary with time. Once again, due to steep angle of projection, vertical component (v_0 * sin(alpha)) is bigger than the horizontal one (v_0*cos(alpha)).

Simple and elegant — to get dy/d time one writes tape.gradient(y, time), couldn’t get easier. It is importnt to remember to change the default presistent=False to Tru — it allows calculating gradients of multiple functions. One has also to be cautious not to use the notation :

v = tape.gradient([x, y], time)

While completely correct from a formal point of view, it returns the sum of dx/d time and dy/d time, not a tuple of tensors. — in our case output dimension is 1x10, and not 2x10.

The only visible problem is, we can calculate the first derivative, which is fine in BackProp, but e.g. in physics every now and then we’re interested in second or higher derivatives. It turnes out, higher order derivatives may also be calculated in a simple manner — to get a gradient of a gradient we have to use two GradientTapes. So the above code should be substituted by:

with tf.GradientTape(persistent=True) as tape2:
with tf.GradientTape(persistent=True) as tape:
x = v_0 * time * np.cos(angle)
y = v_0 * time * np.sin(angle)- g * time**2 /2.
vx = tape.gradient(x, time)
vy = tape.gradient(y, time)
# first derivative of v, being the second derivative of displacement
ax = tape2.gradient(vx, time)
ay = tape2.gradient(vy, time)

This concludes the trivial example of the GradientTape() usage — one can easily verify the corectness of the calcs looking at ay tensor — vertical acceleration is constant and equal to -g, exactly as it should be (horizontal acceleration equals zero, as horizontal velocity is constant).

If you have any comments or just want to share an opinion, please do not hesitate to write.

Cheers, S.

For the sake of completeness, I attach the complete example code below.

# imports
import numpy as np
import tensorflow as tf

# constants
v_0 = 10
alpha = 60
g = 9.81

# convert angle from degrees to radians
angle = 2*np.pi * alpha / 360.

# calculate range and time of flight
r = v_0**2 *np.sin(2*angle) / g
tof = 2 * v_0 * np.sin(angle) / g

# create the list holding time steps from 0 to tof
tt = [tof * t/10. for t in range(0, 11)]

# create a tf.Variable out of tt
time = tf.Variable(tt)

# calculate derivatives
with tf.GradientTape(persistent=True) as tape2:
with tf.GradientTape(persistent=True) as tape:
x = v_0 * time * np.cos(angle)
y = v_0 * time * np.sin(angle)- g * time**2 /2.
# first derivative of displacement
vx = tape.gradient(x, time)
vy = tape.gradient(y, time)
# first derivative of v, being the second derivative of displacement
ax = tape2.gradient(vx, time)
ay = tape2.gradient(vy, time)

--

--

Slawomir Telega, PhD

Code developer since can remember - started in the 80th with ZX Spectrum ;). Also happenes to hold a Ph.D. in Physics :P https://www.linkedin.com/in/s%C5%82awom