Visualizing optimizers using Plotly

Implementing your favorite optimizers and visualizing them using Plotly.

Miguel Otero Pedrido
LatinXinAI
5 min readOct 18, 2022

--

Image Source: A. Amini et al. “Spatial Uncertainty Sampling for End-to-End Control”. NeurIPS Bayesian Deep Learning 2018

In this post my goal is to talk about some of the most used optimization algorithms in the field of Deep Learning. Although there are many resources if you want to delve deeper into their properties and mathematical motivations, in this case I will focus on a more “practical” point of view.

Since I don’t want the blog to become excessively large, there are many steps that I will omit and a lot of code that I will not show. If you want to explore this project in depth, I invite you to take a look at the Github repository!

Python Implementation

This section focuses on the Python implementation of some of the most popular optimization algorithms. Therefore, it is going to be “code intensive”.

If you are only interested in the visualization part feel free to skip to the next section.

If you’re still here, let’s start programming!

Stochastic Gradient Descent

The simplest of all algorithms, but being at the same time the basis of all of them. We all know the gradient descent algorithm but, to refresh your memory, this is its pseudocode.

The code for this algorithm is also very simple, as can be seen below.

Do not worry about the Optimizer class, as it is used for subsequent visualizations. The only important thing at this point is the update_params method, which allows you to update the parameters using the gradient of the scalar function.

As can be seen, the attribute self.gradient allows us to access the gradient of the function, computed by autograd (more details about this in the repository). As the gradient is another function, we can obtain its value by evaluating it with the self.params.

Momentum optimization

This algorithm introduces a momentum vector (represented as m in the algorithm definition), which is controlled with a new hyperparameter β, that prevents this vector from growing too large.

In the case of gradient descent, the gradient could be interpreted as a measure of “velocity”. However, in this case it should be interpreted as an “acceleration”.

Source (Stanford CS231n class)

The pseudocode accounts for all the intuitions previously discussed.

The Python implementation is still very straightforward, as we can see in the following code snippet,

Nesterov

Nesterov Accelerated Gradient offers a small variation from Momentum Optimization. In this case, the gradient is measured not at the local position θ, but slightly ahead in the direction of the momentum.

Source (Stanford CS231n class)

As we can see in the pseudocode, the only difference from the momentum optimization appears at the first expression.

As expected, the Python implementation only requires a minor change from the previous code.

RMSProp

This algorithm has proven to be very effective in practice, which is why we have included it in this list. Its pseudocode is as follows.

As always, we provide the Python implementation.

Adam

This is the last and most complex of the optimizers expressed so far. The algorithm can be divided into the following 5 steps.

Where t represents the iteration number. In this case, the update_params method will be a bit trickier.

Now that we have all the optimizers implemented in Python let’s move on to the fun part … Visualize them in Plotly!

Plotly Visualization

As we said at the end of the previous section, the only thing left to do now is to visualize the behavior of these optimization algorithms on a “test function”. In this case I have opted for a hyperbolic paraboloid.

In order not to make the post too long I will spare the details about the implementation of the figure in Plotly. All the details are available in the repository, in case you want to replicate the steps.

LatinX in AI (LXAI) logo

Do you identify as Latinx and are working in artificial intelligence or know someone who is Latinx and is working in artificial intelligence?

Don’t forget to hit the 👏 below to help support our community — it means a lot!

Thank you :)

--

--

Miguel Otero Pedrido
LatinXinAI

Physics, AI, Deep Learning, Philosophy of Mind, … all that nerdy stuff.