Gradient Descents — Batch, Stochastic, and Minibatch

Bhanumathi Ramesh
3 min readNov 6, 2021

--

An act of moving downwards

Gradient Descents

In this article, we will see different types of Gradient descents and how they are different.

What is Gradient Descent?

In mathematics, the gradient is the measure of the steepness of a straight line. A gradient can be uphill in direction (from left to right) or downhill in direction (from right to left). Gradients can be positive or negative and do not need to be a whole number.

According to Wikipedia, gradient descent is a first-order iterative optimization algorithm for finding a local minimum of a differentiable function. The idea is to take repeated steps into the opposite direction of the gradient of the function at the current point because this is the direction of steepest descent. Please find more on Wikipedia about Gradient Descent

Use of Gradient Descent:

Gradient descent is majorly used in the following,

  1. Linear Regression
  2. Logistic Regression
  3. T- SNE
  4. Deep learning(Backbone of deep learning)

Types of Gradient Descents:

There are majorly three types of gradient descent that are widely used,

  1. Batch Gradient Descent
  2. Stochastic Gradient Descent
  3. Mini batch Gradient Descent

To train a Linear Regression model, we have to learn some model parameters such as feature weights and bias terms(beta coefficients). An approach to do the same unlike the OLS method is Gradient Descent which is an iterative optimization algorithm capable of tweaking the model parameters by minimizing the cost function over the train data. It is a complete algorithm i.e it is guaranteed to find the global minimum (optimal solution) given there is enough time and the learning rate is not very high.

Batch Gradient Descent: Batch Gradient Descent considers complete training data set for calculating gradients (model parameters) in each iteration. Batch Gradient descent is applied to algorithms that use convex cost functions.

Stochastic Gradient Descent: Known as SGD is stochastic i.e it picks up a “random” instance of training data at each step and then computes the gradient. Stochastic Gradient descent is applied to algorithms that use non-convex cost functions.

Mini Batch Gradient Descent: Mini-batch gradient descent is a variation of the gradient descent algorithm that splits the training dataset into small batches that are used to model coefficients. This is widely used in deep learning.

Difference between Batch Gradient Descent and Stochastic Gradient Descent

Difference between Batch and Stochastic gradient descents

Conclusion:

Gradient descent works in spaces of any number of dimensions, even in infinite-dimensional ones. The gradient descent can take many iterations to compute a local minimum with the required accuracy if the curvature in different directions is very different for the given function. Gradient descent also suffers from issues like choosing optimal learning rate, saddle point, and also fluctuation at global minima. In the upcoming post, we will see how to derive equations for batch, stochastic and mini-batch algorithms, and see how to overcome saddle point, fluctuation problems at global minima.

LinkedIn: Bhanumathi Ramesh

Also See, Simple Linear Regression, Multiple Linear Regression

--

--