Optimization Algorithms and Interactive Visualization (Part 1)

Published in

deepkapha notes

10 min readMar 7, 2023

Screen shot of the interactive streamlit application

Algorithms for optimization

Artificial Intelligence (AI) has become a ubiquitous term that encompasses a wide range of technologies and applications, from image and speech recognition to autonomous vehicles and predictive analytics. At its core, AI involves the development of computer algorithms and systems that can learn and make decisions based on data, rather than being explicitly programmed by humans. One of the key subfields of AI is machine learning, which involves the use of statistical techniques to enable computers to improve their performance on a specific task over time, based on experience. In this article, we will explore the basics of machine learning, including its different types, key algorithms, and practical applications.

A Public url for the app is provided here: http://ec2-3-145-64-26.us-east-2.compute.amazonaws.com:8502/

Plotting Gradient Descent with Momentum for One-Dimensional Functions

import numpy as np
import random
from matplotlib import pyplot as plt
# example of plotting gradient descent with momentum for a one-dimensional function


# objective function
def objective(x, y):
   return x**2.0 + y**2.0
# derivative of objective function
def derivative(x, y):
   return np.asarray([x * 2.0, y * 2.0])
# define range for input
r_min, r_max = -1.0, 1.0
# sample input range uniformly at 0.1 increments
xaxis = np.arange(r_min, r_max, 0.1)
yaxis = np.arange(r_min, r_max, 0.1)
# create a mesh from the axis
x, y = np.meshgrid(xaxis, yaxis)
print(np.size(x))
# compute targets
results = objective(x, y)
# create a surface plot with the jet color scheme
figure = plt.figure(figsize=(6,6))
axis = figure.gca(projection='3d')
axis.plot_surface(x, y, results, cmap='jet')
# show the plot
plt.show()

The given code is an example of plotting gradient descent with momentum for a one-dimensional function. The code imports the NumPy library as ‘np’, the random library, and the pyplot sublibrary from the matplotlib library as ‘plt’. The NumPy library is used to perform mathematical operations on arrays and matrices, and the pyplot library is used to create visualizations like plots and charts. The objective of the code is to create a surface plot of a one-dimensional function using the mesh grid and plot_surface functions provided by the NumPy and pyplot libraries, respectively. The code also defines a derivative function that computes the gradient of the objective function, which is used in gradient descent.

The code first defines the objective function that takes two input parameters, x, and y, and returns the sum of their squares. The derivative function is also defined that taking the same input parameters and returning their derivatives, computed as an array with two elements. The first element is the derivative of the objective function with respect to x, and the second element is the derivative with respect to y.Next, the code defines the range of input values for x and y. The minimum and maximum values for both x and y are set to -1.0 and 1.0, respectively. The x and y ranges are then sampled uniformly at 0.1 increments using the arrange function provided by the NumPy library. The mesh grid function is used to create a mesh of x and y values, which is used as input to the objective function to compute the target values. The results of the objective function are plotted as a surface plot using the plot_surface function provided by the pyplot library. The jet color scheme is used to color the surface plot. The plot is then displayed using the show function provided by the pyplot library. Finally, the code defines a function named plot_solutions that takes two input parameters, solutions, and titlestr. The solutions parameter is a list of points that represent the solutions found during gradient descent. The titlestr parameter is a string that is used as the title for the plot.

Plotting Solutions of Objective Function with Contour and Scatter Plot.

def plot_solutions(solutions,titlestr):
# sample input range uniformly at 0.1 increments
xaxis = np.arange(bounds[0,0], bounds[0,1], 0.1)
yaxis = np.arange(bounds[1,0], bounds[1,1], 0.1)
# create a mesh from the axis
x, y = np.meshgrid(xaxis, yaxis)
# compute targets
results = objective(x, y)
# create a filled contour plot with 50 levels and jet color scheme
plt.contourf(x, y, results, levels=50, cmap='jet')
# plot the sample as black circles
solutions = np.asarray(solutions)
plt.plot(solutions[:, 0], solutions[:, 1], '.-', color='w')
# show the plot
plt.title(titlestr)
plt.show()

The plot_solutions function first samples the input range uniformly at 0.1 increments using the bounds parameter, which is not defined in the code snippet provided. It then creates a mesh of x and y values using the meshgrid function, computes the targets using the objective function, and plots the results as a filled contour plot using the contourf function provided by the pyplot library. The contour plot is colored using the jet color scheme and is plotted with 50 levels.

The solutions are then plotted as black circles on the contour plot using the plot function provided by the pyplot library. The plot function takes two arrays as input, one for the x-coordinates and one for the y-coordinates of the points to be plotted. The arrays are created using NumPy’s array function and are sliced using the colon operator to extract the x-coordinates and y-coordinates from the solutions array. Finally, the titlestr parameter is used as the title for the plot using the title function provided by the pyplot library. The plot is then displayed using the show function provided by the pyplot library.

Implementation of Gradient Descent Algorithm to Minimize an Objective Function. (Equation Explanation)

Stochastic Gradient Descent (SGD) with momentum is an optimization technique used in machine learning and deep learning to speed up convergence during training. It uses the concept of momentum to accelerate the gradient descent process.

The equation for SGD with momentum can be expressed as:

v_t = βv_{t-1} + (1-β)g_t
θ_t = θ_{t-1} - αv_t
where:
θ_t is the updated value of the parameters at time step t.
θ_{t-1} is the previous value of the parameters.
α is the learning rate, which determines the step size at each iteration.
v_t is the velocity or momentum of the parameters at time step t.
β is the momentum coefficient, which controls the contribution of the previous momentum to the current momentum.
g_t is the gradient of the loss function at time step t.

The momentum term can be seen as a moving average of the gradients. It accumulates the gradients from previous time steps with a decaying weight (determined by β) and adds the current gradient to it. The larger the momentum coefficient, the more the previous gradients contribute to the current momentum. This allows the algorithm to continue moving in the same direction as before, even if the current gradient is pointing in a slightly different direction.

The updated parameters are then calculated by subtracting the momentum term (v_t) from the previous parameters (θ_{t-1}). This means that the algorithm takes a step in the direction of the negative gradient, but also takes into account the previous momentum, allowing it to move more smoothly along the optimization path.

Overall, SGD with momentum helps to prevent the algorithm from getting stuck in local minima and speeds up convergence towards the global minimum of the loss function.

Implementation of Gradient Descent Algorithm to Minimize an Objective Function. (Code Explanation)

# gradient descent algorithm
def gradient_descent(objective, derivative, bounds, n_iter, step_size, momentum):
# track all solutions
solutions = list()
# generate an initial point
x = bounds[:, 0] + np.random.rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
score = objective(x[0], x[1])
# keep track of the change
change = 0.0
# run the gradient descent
for t in range(n_iter):
# calculate gradient
gradient = derivative(x[0], x[1])
for i in range(x.shape[0]):
# calculate update
new_change = step_size * gradient[i] + momentum * change
# take a step
x[i] = x[i] - new_change
# save the change
change = new_change
# evaluate candidate point
solution_eval = objective(x[0], x[1])
# store solution
solutions.append(x.copy())
#scores.append(solution_eval)
# report progress
print('>%d f(%s) = %.5f' % (t, x, solution_eval))
return solutions

The code provided is an implementation of the gradient descent algorithm. The purpose of this algorithm is to find the minimum of a given objective function. The algorithm works by starting at an initial point and then iteratively moving towards the minimum by taking steps in the direction of the negative gradient of the objective function. The step size and momentum are hyperparameters that can be adjusted to control the rate of convergence toward the minimum.

The gradient_descent function takes as input the objective function, its derivative, the bounds of the search space, the number of iterations to perform, the step size, and the momentum. The function generates an initial point within the bounds of the search space and computes the score of that point using the objective function. Then, it runs the gradient descent algorithm for the specified number of iterations, updating the point by subtracting the step size times the gradient of the objective function, plus the momentum times the previous change. The solutions are saved after each iteration in a list, and the progress is reported by printing the iteration number, the current point, and the score of the objective function evaluated at that point. The solutions list can be used to visualize the path taken by the algorithm toward the minimum. The function returns the solutions list.

In summary, the gradient descent algorithm implemented in the provided code aims to find the minimum of a given objective function by iteratively moving toward the minimum using the negative gradient of the function. The algorithm takes as input the objective function, its derivative, the search space bounds, and hyperparameters such as the step size and momentum. The solutions are saved after each iteration in a list, and the progress is reported by printing the iteration number, the current point, and the score of the objective function evaluated at that point.

Optimization Algorithms and Visualization using Adam and Plotting in Python. (Equation Explanation)

The Adam optimization algorithm is a popular stochastic gradient descent optimization technique used in deep learning. The equation for Adam optimization combines the concepts of momentum and adaptive learning rates to achieve fast convergence and stable training.

The equation for Adam optimization can be expressed as:
θ_t = θ_{t-1} - α * (m_t / (sqrt(v_t) + ϵ))
where:
θ_t is the updated value of the parameters at time step t
θ_{t-1} is the previous value of the parameters
α is the learning rate
m_t is the exponentially weighted average of the gradient with bias correction for momentum
v_t is the exponentially weighted average of the squared gradient with bias correction for adaptive learning rate
ϵ is a small constant (e.g., 10^-8) to prevent division by zero.
The exponentially weighted averages are computed as follows:
m_t = β_1 * m_{t-1} + (1 - β_1) * g_t
v_t = β_2 * v_{t-1} + (1 - β_2) * g_t²
where:
g_t is the gradient of the loss function at time step t
β_1 and β_2 are the decay rates for the first and second moments of the gradient, respectively.
The bias correction terms are added to adjust the estimates of the first and second moments of the gradient:
m_t = m_t / (1 - β_1^t)
v_t = v_t / (1 - β_2^t)

The Adam optimization algorithm adapts the learning rate for each parameter based on the magnitude of the gradients and the history of the gradients. It combines the benefits of momentum optimization and adaptive learning rate methods to achieve fast convergence and good generalization performance.

Optimization Algorithms and Visualization using Adam and Plotting in Python. (Code Explanation)

# gradient descent algorithm with adam
def adam(objective, derivative, bounds, n_iter, alpha, beta1, beta2, eps=1e-8):
# generate an initial point
solutions = list()
x = bounds[:, 0] + np.random.rand(len(bounds)) * (bounds[:, 1] - bounds[:, 0])
score = objective(x[0], x[1])
# initialize first and second moments
m = [0.0 for _ in range(bounds.shape[0])]
v = [0.0 for _ in range(bounds.shape[0])]
# run the gradient descent updates
for t in range(n_iter):
# calculate gradient g(t)
g = derivative(x[0], x[1])
# build a solution one variable at a time
for i in range(x.shape[0]):
# m(t) = beta1 * m(t-1) + (1 - beta1) * g(t)
m[i] = beta1 * m[i] + (1.0 - beta1) * g[i]
# v(t) = beta2 * v(t-1) + (1 - beta2) * g(t)²
v[i] = beta2 * v[i] + (1.0 - beta2) * g[i]**2
# mhat(t) = m(t) / (1 - beta1(t))
mhat = m[i] / (1.0 - beta1**(t+1))
# vhat(t) = v(t) / (1 - beta2(t))
vhat = v[i] / (1.0 - beta2**(t+1))
# x(t) = x(t-1) - alpha * mhat(t) / (sqrt(vhat(t)) + eps)
x[i] = x[i] - alpha * mhat / (np.sqrt(vhat) + eps)
# evaluate candidate point
score = objective(x[0], x[1])
solutions.append(x.copy())
# report progress
print('>%d f(%s) = %.5f' % (t, x, score))
return solutions

The above code includes two optimization algorithms, Adam and RMSProp, and also includes a plotting function to visualize the optimization process. The first part of the code initializes the random seed and sets the input range, number of iterations, and step size for the optimization. Then, it sets the beta1 and beta2 values for the Adam optimizer and calls the adam function, which performs the gradient descent search with Adam and returns a list of solutions. Finally, it calls the plot_solutions function to plot the solutions obtained by Adam.

The second part of the code defines the rmsprop function, which performs the gradient descent search with RMSProp. It initializes the solutions list and generates an initial point within the given input range. The sq_grad_avg list is also initialized to store the average square gradients for each variable. The function then runs the gradient descent for the specified number of iterations. It calculates the gradient and updates the average of the squared partial derivatives for each variable. Then, it calculates the learning rate for each variable and updates the position accordingly. The new solution is stored and added to the solutions list. Finally, the function evaluates the candidate point and reports the progress by printing the iteration, solution, and evaluation.

The third part of the code contains the plot_solutions function, which takes a list of solutions and a title as input and plots the optimization process. It initializes the plot, sets the title, and adds the input range as the background. Then, it plots each solution as a red dot and connects consecutive solutions with a blue line. Finally, it shows the plot.

Here is a video of the working url in action.

Please find the link to code here: https://github.com/deepkapha/EarthScanWebinar/tree/main/notebook