Maybe it’s time that we stop focusing on teaching just the algorithms and start incorporating lessons on data collection and curation.

The internet has exploded in recent years with content focused on AI. AI has become synonymous with machine learning, and machine learning has taken the world by storm. With the momentum that machine learning currently carries, it doesn’t look like the storm will settle any time soon.

Most of the famous research papers and articles focus on the algorithms behind this incredible revolution. I personally spent years studying machine learning and the algorithms associated with it. After all, all…


An easy to understand explanation of the Adam optimizer and how to code it from scratch using Python and PyTorch.

The Adam optimizer, along with its different flavours, are arguably the most popular and effective optimizers in deep learning today. The optimizer is a combination of 2 equations which significantly advanced the vanilla SGD: momentum and RMSProp. In this article, we’ll cover how the 2 equations come together to make the Adam optimizer. I wrote about both the equations in previous posts, so if you need a refresher on them or this is the first time you’re hearing about them…


An easy to follow tutorial on how to implement the famous RMSProp from scratch in Python with the help of PyTorch.

In this article we’re going to be discussing RMSProp — an equation which dynamically adjusts the learning rate for neural networks. It was first introduced by Geoffrey Hinton in his Coursera course, and, like much of Professor Hinton’s work, has since been very influential in the world of deep learning. …


An easy to follow tutorial to momentum for deep learning from scratch. A code first explanation using Python and PyTorch.

As neural networks have developed, so have the optimizers for gradient descent (if you need a refresher on gradient descent, please refer to my book). The use of an optimal optimizer is a constant area of research. In this article, we’ll be covering momentum — a fundamental addition which made way for the famous ADAM optimizer.

Like many other improvements made in deep learning, momentum is not specific to deep learning, rather, it has been borrowed from other fields of…


An easy to follow guide to weight decay (aka L2 regularization). Understand how to write it from scratch with PyTorch.

Weight decay, aka L2 regularization, aka ridge regression… why does it have so many names? Your guess is as good as mine. Like many other deep learning concepts, it’s a fancy term for a simple concept (in practice). It’s also something which took me a very long time to really understand, because it was cluttered behind all this math. If you’re having the same struggle I was having, then I’m hoping this article ends your search.

What is it?

Let’s start off with…


A thorough explanation of the vanishing gradient and sigmoids. Explained in code, along with ReLu and the Kaiming initalization.

In recent years, deep learning and neural networks have been making huge strides in every industry. This AI technology seems to have moved in overnight and allowed us to solve problems which we’ve been hacking at for years, sometimes even decades.

Although the idea of neural networks have been around since the mid-1900s, they weren’t really used outside of the academic world until after 2010, but why? If they’re so powerful, then why weren’t machine learning practitioners using them in real…


When working with data in the real world, the issue of an imbalanced dataset comes up more often than you may think. An imbalanced dataset is when the data has significantly less data in a class, relative to the other classes present. For example, a dataset which is used for predicting diabetes, it wouldn’t be rare to find 2 times more records of patients without diabetes than patients with diabetes. Why is this a problem? It’s a problem because, in the example mentioned, our model can easily achieve 66% accuracy by always predicting the patient to not have diabetes. Yes…


In case you missed it, in a previous article we went over batch gradient descent in-depth and saw how it vastly improved the vanilla gradient descent approach. In this article, we’ll revisit batch gradient descent, but instead, we’ll take advantage of PyTorch’s powerful Dataset and DataLoader classes. By the end of this article, you will be convinced to never go back to a life of deep learning without PyTorch’s DataLoader.

Before we begin, we’ll rerun through the steps which we performed in the previous batch gradient descent article. Just like last time, we’ll use the Pima Indians Diabetes dataset, set…


What is Batch Gradient Descent?

The question you’re probably asking right now is, “what is batch gradient descent and how does it differ from normal gradient descent?” Batch gradient descent splits the training data up into smaller chunks (batches) and performs a forward propagation and backpropagation by the batch. This allows us to update our weights multiple times in a single epoch.

What Are the Benefits?

Performing calculations on small batches of the data, rather than all our data at once, is beneficial in a few ways. To name a few:

  1. It’s less straining on memory. Think about if we had a million 4K images . …


Scaling data is amongst the most fundamental steps in preprocessing data before throwing it into a neural network. It allows all the data which is fed into the neural network to be on the same scale. This turns out to be a crucial step, because similar scales in all the features helps the model by easing the training process. It helps greatly with the time it takes to train the model along with improving the actual fit of the model to the training data. …

Akmel Syed

Hi, I’m Akmel Syed. I’m a father, a husband, a son, a brother, a data science professional and I also happen to write about machine learning.

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store