In part 4, we looked at some heuristics that can help us tune the learning rate and momentum better. In this final article of the series, let us look at a more *principled* way of adjusting the learning rate and give the learning rate a chance to adapt.

Citation Note: Most of the content and figures in this blog are directly taken from Lecture 5 of CS7015: Deep Learning course offered by Prof. Mitesh Khapra at IIT-Madras.

Consider the following simple perceptron network with sigmoid activation.

It should be easy to see that given a single point (**x**, *y*), gradients…

In part 3, we looked at stochastics and mini-batch versions of the optimizers. In this post, we will look at some commonly followed heuristics on how to tune the learning rate, etc. If you are not interested in these heuristics, feel free to skip to part 5 of the Learning Parameters series.

Citation Note: Most of the content and figures in this blog are directly taken from Lecture 5 of CS7015: Deep Learning course offered by Prof. Mitesh Khapra at IIT-Madras.

One could argue that we could have solved the problem of navigating gentle slopes by setting the learning rate…

versions of these algorithms.

In part 2, we looked at two useful variants of gradient descent — Momentum-Based and Nesterov Accelerated Gradient Descent. In this post, we are going to look at stochastic versions of gradient descent. You can check out all the posts in the ** Learning Parameters **series by clicking on the kicker tag at the top of this post.

Citation Note: Most of the content and figures in this blog are directly taken from Lecture 5 of CS7015: Deep Learning course offered by Prof. Mitesh Khapra at IIT-Madras.

Let us look at vanilla gradient descent we talked about in part-1 of the…

In this post, we look at how the gentle-surface limitation of Gradient Descent can be overcome using the concept of momentum to some extent. Make sure you check out my blog post — Learning Parameters, Part-1: Gradient Descent, if you are unclear of what this is about. Throughout the blog post, we work with the same toy problem introduced in part-1. You can check out all the posts in the ** Learning Parameters **series by clicking on the kicker tag at the top of this post.

In part-1, we saw a clear illustration of a curve where the gradient can be…

Gradient Descent is one of the most popular techniques in optimization, very commonly used in training neural networks. It is intuitive and explainable, given the right background of essential Calculus. Take a look at this blog post of mine — Part 0 of sorts, that covers some of the prerequisites needed to make better sense of this series. You can check out all the posts in the ** Learning Parameters** series by clicking on the kicker tagged at the top of this post.

In this blog post, we build up the motivation for Gradient Descent using a toy neural network. We…

This is an optional read for the 5 part series I wrote on learning parameters. In this post, you will find some basic stuff you’d need to understand my other blog posts on how deep neural networks learn their parameters better. You can check out all the posts in the ** Learning Parameters **series by clicking on the kicker tag at the top of this post.

We will briefly look at the following topics:

- Multivariable Functions
- Local Minimum vs. Global Minimum
- Understanding The Gradient
- Cost or Loss Function
- Contour Maps

A multivariable function is just a function whose input and/or output…

This HCI (Human-Computer Interaction) application in Python(3.6) will allow you to control your mouse cursor with your facial movements, works with just your regular webcam. Its hands-free, no wearable hardware or sensors needed.

Special thanks to **Adrian Rosebrock** for his amazing blog posts [2] [3], code snippets and his imutils library [7] that played an important role in making this idea of mine a reality.

Now, I definitely understand that these facial movements could be a little bit weird to do, especially when you are around people. Being a patient of benign-positional-vertigo, I hate doing some of these actions…

This post will discuss the famous *Perceptron Learning Algorithm,* originally proposed by Frank Rosenblatt in 1943, later refined and carefully analyzed by Minsky and Papert in 1969. This is a follow-up post of my previous posts on the McCulloch-Pitts neuron model and the Perceptron model.

*Citation Note: The concept, the content, and the structure of this article were based on Prof. **Mitesh Khapra**’s** **lectures slides and videos of course **CS7015: Deep Learning** taught at IIT Madras.*

You can just go through my previous post on the perceptron model (linked above) but I will assume that you won’t. …

The most fundamental unit of a deep neural network is called an *artificial neuron*, which takes an input, processes it, passes it through an activation function like the Sigmoid, return the activated output. In this post, we are only going to talk about the *perceptron* model proposed before the ‘activation’ part came into the picture.

Frank Rosenblatt, an American psychologist, proposed the *classical perceptron* model in 1958. Further refined and carefully analyzed by Minsky and Papert (1969) — their model is referred to as the *perceptron* model. …

It is very well known that the most fundamental unit of deep neural networks is called an *artificial neuron/perceptron*. But the very first step towards the *perceptron *we use today was taken in 1943 by McCulloch and Pitts, by mimicking the functionality of a biological neuron.

*Note: The concept, the content, and the structure of this article were largely based on the awesome lectures and the material offered by Prof. **Mitesh M. Khapra** on **NPTEL**’s **Deep Learning** course. Check it out!*

**Dendrite**: Receives signals from other neurons

**Soma**: Processes the information

**Axon**: Transmits the output of this neuron

**Synapse**: Point…

Deep Learning Research Assistant @ IIT Hyderabad.