Learning Parameters

Let’s look at gradient descent with an adaptive learning rate.

In part 4, we looked at some heuristics that can help us tune the learning rate and momentum better. In this final article of the series, let us look at a more principled way of adjusting the learning rate and give the learning rate a chance to adapt.

Citation Note: Most of the content and figures in this blog are directly taken from Lecture 5 of CS7015: Deep Learning course offered by Prof. Mitesh Khapra at IIT-Madras.

Motivation for Adaptive Learning Rate

Consider the following simple perceptron network with sigmoid activation.

It should be easy to see that given a single point (x, y), gradients…


Learning Parameters

Before moving on to advanced optimization algorithms let us revisit the problem of learning rate in gradient descent.

In part 3, we looked at stochastics and mini-batch versions of the optimizers. In this post, we will look at some commonly followed heuristics on how to tune the learning rate, etc. If you are not interested in these heuristics, feel free to skip to part 5 of the Learning Parameters series.

Citation Note: Most of the content and figures in this blog are directly taken from Lecture 5 of CS7015: Deep Learning course offered by Prof. Mitesh Khapra at IIT-Madras.

One could argue that we could have solved the problem of navigating gentle slopes by setting the learning rate…


Learning Parameters

Let’s digress a bit from optimizers and talk about the stochastic
versions of these algorithms.

In part 2, we looked at two useful variants of gradient descent — Momentum-Based and Nesterov Accelerated Gradient Descent. In this post, we are going to look at stochastic versions of gradient descent. You can check out all the posts in the Learning Parameters series by clicking on the kicker tag at the top of this post.

Citation Note: Most of the content and figures in this blog are directly taken from Lecture 5 of CS7015: Deep Learning course offered by Prof. Mitesh Khapra at IIT-Madras.

Motivation

Let us look at vanilla gradient descent we talked about in part-1 of the…


Learning Parameters

Let’s look at two simple, yet very useful variants of gradient descent.

In this post, we look at how the gentle-surface limitation of Gradient Descent can be overcome using the concept of momentum to some extent. Make sure you check out my blog post — Learning Parameters, Part-1: Gradient Descent, if you are unclear of what this is about. Throughout the blog post, we work with the same toy problem introduced in part-1. You can check out all the posts in the Learning Parameters series by clicking on the kicker tag at the top of this post.

In part-1, we saw a clear illustration of a curve where the gradient can be…


Learning Parameters

Gradient Descent is an iterative optimization algorithm for finding the (local) minimum of a function.

Gradient Descent is one of the most popular techniques in optimization, very commonly used in training neural networks. It is intuitive and explainable, given the right background of essential Calculus. Take a look at this blog post of mine — Part 0 of sorts, that covers some of the prerequisites needed to make better sense of this series. You can check out all the posts in the Learning Parameters series by clicking on the kicker tagged at the top of this post.

In this blog post, we build up the motivation for Gradient Descent using a toy neural network. We…


A quick look at some basic stuff essential to understand how parameters are learned.

This is an optional read for the 5 part series I wrote on learning parameters. In this post, you will find some basic stuff you’d need to understand my other blog posts on how deep neural networks learn their parameters better. You can check out all the posts in the Learning Parameters series by clicking on the kicker tag at the top of this post.

We will briefly look at the following topics:

  1. Multivariable Functions
  2. Local Minimum vs. Global Minimum
  3. Understanding The Gradient
  4. Cost or Loss Function
  5. Contour Maps

1. Multivariable Functions

A multivariable function is just a function whose input and/or output…


This HCI (Human-Computer Interaction) application in Python(3.6) will allow you to control your mouse cursor with your facial movements, works with just your regular webcam. Its hands-free, no wearable hardware or sensors needed.

Special thanks to Adrian Rosebrock for his amazing blog posts [2] [3], code snippets and his imutils library [7] that played an important role in making this idea of mine a reality.

Working Example

Usage

Now, I definitely understand that these facial movements could be a little bit weird to do, especially when you are around people. Being a patient of benign-positional-vertigo, I hate doing some of these actions…


This post will discuss the famous Perceptron Learning Algorithm, originally proposed by Frank Rosenblatt in 1943, later refined and carefully analyzed by Minsky and Papert in 1969. This is a follow-up post of my previous posts on the McCulloch-Pitts neuron model and the Perceptron model.

Citation Note: The concept, the content, and the structure of this article were based on Prof. Mitesh Khapra’s lectures slides and videos of course CS7015: Deep Learning taught at IIT Madras.

Perceptron

You can just go through my previous post on the perceptron model (linked above) but I will assume that you won’t. …


The most fundamental unit of a deep neural network is called an artificial neuron, which takes an input, processes it, passes it through an activation function like the Sigmoid, return the activated output. In this post, we are only going to talk about the perceptron model proposed before the ‘activation’ part came into the picture.

Frank Rosenblatt, an American psychologist, proposed the classical perceptron model in 1958. Further refined and carefully analyzed by Minsky and Papert (1969) — their model is referred to as the perceptron model. …


It is very well known that the most fundamental unit of deep neural networks is called an artificial neuron/perceptron. But the very first step towards the perceptron we use today was taken in 1943 by McCulloch and Pitts, by mimicking the functionality of a biological neuron.

Note: The concept, the content, and the structure of this article were largely based on the awesome lectures and the material offered by Prof. Mitesh M. Khapra on NPTEL’s Deep Learning course. Check it out!

Biological Neurons: An Overly Simplified Illustration

Dendrite: Receives signals from other neurons

Soma: Processes the information

Axon: Transmits the output of this neuron

Synapse: Point…

Akshay L Chandra

Deep Learning Research Assistant @ IIT Hyderabad.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store