Brief Introduction of Optimization Algorithm

Dishaa Agarwal

5 min readJan 10, 2021

Table of Content:

· Optimization Introduction

· Key Terms used in Optimization

· Optimization Solution Methods

· Optimization Algorithms

· Difference between OLS and Gradient Descend

· Python Code Implementation for OLS and Gradient Descend

In this article, we are going to focus on the basic understanding of OLS and Gradient Descend algorithms.

Optimization Introduction

Optimization is essential in all machine learning algorithms. It basically involves reducing the loss function/ cost function until we get the optimal value for our model.

It is really important for us to choose the Optimization algorithm for getting good accuracy with a minimum loss function.

The optimization problem is generally written as:

minimize x f(x)

Where f= Objective function which we want to minimize.

x= decision variable which the Optimizer evaluates.

Key Terms used in Optimization

1. Loss Function: Machines generally learn by means of Loss Function. It is a method that evaluates how close our predicted values(y-hat) are to the actual values(y).

Loss function will be high when our predicted values are too are from actual values.

The main role of the Optimization algorithm is to minimize the loss function. In this article for the purpose of understanding, we are going to use MSE (Mean Squared Error) loss function.

2. Objective function: It is the function which we want to minimize or maximize.

3. Minima: It is a point where the function value is smaller than at nearby points but possibly greater than at a distant point.

Optimization Solution Methods:

Optimization Solutions can be of two types

1. Analytical Solution: This gives us the exact solution. It generally works only when the problem has a closed-form -solution.

2. Iterative Solution: This generally gives us the approximate solution which is very near to the actual solution. It can solve both constraint and unconstraint problems.

Optimization Algorithms

There are many optimization algorithms but in this article, we are going to focus on OLS (Ordinary Least Square) and Gradient Descend.

1. OLS (Ordinary Least Square): It is a strategy to obtain the best fit line for our model such that the loss function is minimum.

The idea behind OLS is to find the optimized model coefficients (w) for which the loss function is minimum.

OLS uses Linear Regression to find the best-fit line and gives the analytical solution.

2. Gradient Descend/ Batch Gradient Descend: It is an iterative solver that gives us the approximate solution. The main goal is to reach minima.

If we choose the Hyperparameter (learning rate) correctly then we can reach very close to the actual minima.

Steps involved while performing Gradient Descend:

· Initialize learning-rate, model-coefficients with a random value.

· Calculate the predicted-values(y-hat)

· Calculate the loss function and error-term

· Update the model-coefficients

· Repeat the steps until convergence.

Effect of learning-rate: it controls how fast or slow we should converge to a minima. On one hand, if the learning rate is very small then gradient descent will take time to converge else if the learning rate is very large then gradient descent will converge too quickly.

Gradient Descend has other variations:

a) Stochastic Gradient Descent: It is very similar to the Batch Gradient Descend. The only difference I that it updates model coefficients for each epoch.

In Stochastic Gradient Descent, we calculate the error for every epoch.

Limitation of Stochastic Gradient Descent:

· More sensitive to outliers as we are using all the data-points.

· Gives an approximate solution that is not very close to analytical Solution.

Advantages of Stochastic Gradient Descent: It gives us faster convergence.

b) Mini-Batch Gradient Descent: Its working is very similar to the Stochastic Gradient Descent but the only difference lies in that the data points are divided into mini-batches instead of taking the whole batch at once. The updated model coefficients from the previous batch are taken as input for the next batch. The error-terms are also calculated as per batch.

In this, the batch-size is the hyper-parameter.

Comparison between the Gradient Descent.

Difference between OLS and Gradient Descend:

· OLS is an analytical solver. This is fairly good in the case of Linear Regression. OLS is generally not used for large datasets because of multicollinearity.

· Gradient Descent is an iterative solver which is preferred for the complex dataset and non-linear operations.

Python Code:

Check out my jupyter notebook to explore more regarding the Optimization Algorithms.

https://github.com/dishaaagarwal/Optimization-Code/blob/main/OPTIMIZATION-Code.ipynb

I hope this article was helpful in getting the hang of the Optimization algorithm. Stay tuned for more articles to come. Please like and leave your comments/queries below.

References:

If you wish to dig deeper into Optimization algorithms then follow these:

· https://towardsdatascience.com/demystifying-optimizations-for-machine-learning-c6c6405d3eea#:~:text=Optimization%20is%20the%20most%20essential,or%20the%20other%20optimization%20routine.

· https://towardsdatascience.com/common-loss-functions-in-machine-learning-46af0ffc4d23

· https://towardsdatascience.com/understanding-the-ols-method-for-simple-linear-regression-e0a4e8f692cc

Brief Introduction of Optimization Algorithm

minimize x f(x)

Written by Dishaa Agarwal