The Fastest Introduction To Deep Learning You’ll Ever Get

Published in

Zaka

13 min readAug 27, 2020

The paradox of the mind: the human brain is constantly trying to understand and study the human brain, itself really. Fascinating stuff. An entity attempting to apprehend its own existence. However, the mind has gone beyond that and challenged itself to recreate its own behavior. This is, in a very artistic and philosophical aspect, what Deep Learning is all about.

Ben Dickson asks in this article, source of this image, “is Deep Learning over-hyped?”

Let’s be fair, if you made it to this blog, you’ve definitely heard of the following terms before: Artificial Intelligence (AI), Machine Learning (ML), and Deep Learning (DL). Let’s start our journey into DL by clearing up and demystifying some technical jargon: deep learning is a subset of machine learning, machine learning is a sub-field of AI which is the science of trying to make machines mimic human behavior.

In the upcoming sections, we will work and elaborate on the following topics

The biology of deep learning
Applications, advantages, and challenges of deep learning
Structure of an artificial neural network (ANN)
Breakdown of a single-layer perception — the simplest form of an ANN
Feedforward neural networks
The backpropagation algorithm
Summary/Recap

In this blog post, we will introduce the concept of deep learning and how it works but first, we’ll start with some biology. You thought philosophy was the only science here right? The human body is a very interesting, harmonious agglomeration of complexity. In a subset, the brain is the most important organ that enables us to do all the things that make us human (decision making, thinking, feeling…). You might wonder why we’re talking about biology right now… Let’s dive in, shall we?

Deep learning tries to imitate how the human brain works (such as processing data and creating patterns for decision making). The neuron is the most basic working unit of the human brain that receives signals, transmits information to other neurons, and decides whether it should activate (pass along) or inhibit (stop) a certain signal/function. DL is based on artificial neural networks (ANN) inspired by the structure of the brain, and these ANNs mimic the procedure of how our brain neurons work. The artificial neuron in deep learning is commonly referred to as a node or unit, and later on, we will demonstrate what a basic neural network looks like and how it works.

Very interesting big words. Where can I see them in Real Life?

Let’s quickly enumerate some applications of Deep Learning, advantages it can have, and challenges it faces. We all face challenges every day, even artificial brains and neurons, right?

Some very famous and common applications of Deep Learning include…

a. Self-driving cars

The ability to teach cars how to autonomously drive themselves while avoiding obstacles and safely cruising around the streets.

b. Voice and image recognition

Voice recognition is simply when you talk to the computer and your words are transcribed into the machine (which can be very helpful for people with certain disabilities). On the other hand, image recognition allows you to identify people, places, and objects in an image.

c. Translating languages

Using deep learning, we are able to enhance automatic translation. This is actually very helpful especially when it comes to business operations, tourism, and more!

d. Reading/Generating handwritten texts

Given a corpus of handwritten texts, you can make machines understand and even generate similar texts in the form of words, sentences, and even full-on stories/articles.

e. Cancer Detection

Deep learning is getting a lot of attention when it comes to medical imaging problems. It has shown the ability to accurately detect diseases and cancerous tumors.

f. Adding Color to Black-and-White images and videos

Transforming black and white images to colored ones was previously done by humans in a very time-consuming manner. Nevertheless, using deep learning you can accurately colorize your images (impressive, yes?).

If it didn’t have cool advantages, we wouldn’t really create a blog about it right? Here are some important advantages.

a. Can execute feature engineering by itself

DL will scan the data, search for features that impact the decision needed to be made by the machine, and carries out the decision making all on its own! This will help data scientists save a lot of time and work.

b. Best results with unstructured data

A huge amount of data is unstructured (have different formats) in nature and a lot of machine learning algorithms are not able to analyze these unstructured data. DL allows you to easily process data of different formats to obtain the desired predictions.

c. Efficient at delivering high-quality results

Unlike humans, deep learning models do not get exhausted. They can perform thousands of iterations and tasks with an acceptable amount of time and they still deliver high-quality results (unless the data being used is of “bad” quality — garbage in, garbage out).

Life, however, does not come without its challenges. Here are some for DL.

a. Requires a huge amount of data

To be able to learn and when it comes to parameter tuning, DL needs a huge amount of data (more than other machine learning algorithms).

b. Resource-demanding technology

It requires powerful GPUs and large amounts of storage to train models.

c. Models are more difficult to interpret compared to traditional machine learning models

It is challenging to understand exactly how the deep network was able to reach a particular solution.

Structure of a Deep Artificial Neural Network

Source “When not to use Neural Networks”

Artificial Neural Networks, or ANN for short, are a set of algorithms modeled loosely after the human brain. In their most basic form, they are designed to recognize patterns.

A deep neural network has a minimum of 3 main layers.

Input Layer: it is the deposit of the network. It provides information from the outside world into the neural network. You pay with data at your input layer to buy results.
Hidden Layer: Arguably the key to it all. The true meaning of deep in deep learning. It refers to layer(s) located between the input and output layers and having two crucial tasks:

A) Summation — Each input connects to a “hidden” node and each hidden node connects to another node till we reach the output nodes (check image above). These connections all have a weight that is assigned according to its relative importance to other inputs (how much a node affects/contributes to the upcoming node). Also, we have another input called Bias. Bias, a threshold value, allows you to shift the activation function by adding a constant to the input. It increases the flexibility of the model to fit data; especially if all input features have a value of 0.

B) Activation Function — After the sum is found, it is passed to an activation function. The purpose of this function is to introduce non-linearity into the output of a neuron. This is important because most real-world data are non-linear, and we want neurons to learn these non-linear representations. There are several activation functions such as sigmoid, tanh, ReLU, and many more. They try to ensure that the model will have stable convergence towards the desired final solution.

3. Output Layer — Remember depositing your data money at the input of this deep learning vending machine? Well, this is where you receive your ice-cold soda. This layer transfers information from the network to the outside world; in simpler terms, it produces the results (the decision/prediction you are trying to learn) for the given inputs.

Single-Layer Perceptron

We’ve talked about ANNs. Now let’s dive deeper (get it?) into the singular element of the layers of the network: the node or the neuron.

By the way, a feed-forward neural network is an ANN where connections between the nodes do not form a cycle, forming feedback loops into earlier layers/nodes. SLP (Single Layer Perceptron), the simplest feed-forward neural network, is a network composed of a single layer, single node, and based on a threshold transfer function. Taking into account its simplicity, we will use it to explain how a neural network works.

X represents the input — W represents the weight — b represents the bias — f(x) represents the activation function — Y represents the output

Let’s take a straightforward example and demonstrate it step by step. Let’s say you are feeling really down, and you want to go watch a movie at the cinema. There are three factors that influence your decision:

You want the movie genre to be horror (X1 = {Yes:1 | No:0})
Is the movie after 7:00 pm (X2 = {Yes:1 | No:0})
Will it be raining outside (X3 = {Yes:1 | No:0})

Assume you are not in the mood at all to watch comedy, romance or any genre other than horror. We will assign X1 = 1. You don’t mind if the movie starts earlier or after, and it’s not a big deal if it is raining outside since you just bought a new car (easy transportation). So, we will assign X2 = 0 and X3 = 0. Now, our input vector I = {X1=1, X2=0, X3=0} is ready. Next, we need to assign the corresponding weights for each input.

Recall: the larger the weight the more the influence of the input. Since you really want to watch a horror movie then W1 > W2 and W1 > W3. So let’s, as a preliminary assumption, consider W1 = 5, W2 = 1, W3 = 1 and the value of the bias to be — 4.

Referring to the definition of the activation function in the above illustration: if the sum of the (inputs*weights) plus the bias is greater than 0, then the activation triggers an output equal to 1 (you go to the movies!).

After finding the summation, we find that the activation function value is >0 → the node fires an output of 1! You will go watch a horror movie, even if it is raining outside and the movie is earlier than 7.

It is very important to note that varying the weights and biases results in a different decision-making model. Try changing the values and see how your decision will change.

Feed-Forward Neural Network

Now that we have worked on a single node and understood its principle of operation for a simple case, we can expand on to a more complex neural network made up of multiple neurons or nodes.

A neural network, as we already mentioned, has a minimum of three layers: an input layer, a hidden layer, and an output layer. The more hidden layers the “deeper” it is, and thus the notion of deep neural networks. It works as follows:

Add the inputs Xi to the network
Assign their corresponding weights and bias randomly at the start
Find the sum of (weights*inputs) then add the bias
Pass the total of step 3 to the activation function. The following number will become the output of node N and part of the input of the nodes in the next layer.
Repeat Steps 3 and 4 until you go through all nodes and all the layers till you reach the output layer.
To update the weights of the network and start to “learn”, you then calculate the error at the outputs with respect to the actual values you expected the network to generate (the expected outputs are there in your dataset.)
Travel back from the output layer to the hidden layers and back to the input layer to adjust all the corresponding weights in a way such that you decrease the error (minimizing the error.)
Keep repeating steps 1 through 7 for all the input samples in your dataset until you reach the desired output.

The concept of training a neural network is equivalent to the steps followed to minimize its error or cost function for the problem/dataset of interest.

So, What is Backpropagation & How Does it Work?

The process (or algorithm) that we simplistically described above to train the network and update its weights is called Backpropagation.

Let’s dive more into the details of what happens after we reach the output layer. Once you get the final results on the output layer, you can compare the predicted answer with the actual answer and compute the error.

Let’s say the error/cost of a single input is chosen based on the following error function (aka. cost function):

Finding the error of all the input examples from your dataset and averaging the results gives you the total loss function of the network. This is where an optimization algorithm has to step in order to adjust the network weights for the purpose of minimizing the cost. A very famous example is the Gradient Descent Algorithm. Let’s check it out.

The objective of Gradient Descent is to minimize the loss function by updating the weights for each input iteration. We want to change the parameters (weights) in a way to minimize the error. Don’t forget, the weights were initially assigned randomly. So of course, they will not be “good” at the start.

In order to update the weights, you must compute the gradient of the loss function with respect to each of the weights that have been set. Backpropagation is the tool that gradient descent uses to calculate the gradient of the loss function. It helps adjust the weights of the neurons so that the result comes closer and closer to the known true result. The derivative (gradient) measures the degree to which a slight change in the weights causes a slight change in the error:

The next step is to multiply the gradient by a learning rate:

LR stands for Learning Rate

By doing so, we will be computing and deducing by how much we should update our weight. The learning rate determines the size of steps we are taking to reach the minimum of a loss function. Usually, a learning rate has a small positive value within the range of (0,1).

A small learning rate (0.0001 for example) indicates that we will slightly affect the weight. Note, however, that if the learning rate is large (0.9 for instance) this might lead to overshooting. In other words, you might take a “huge” step in the direction of the minimized function and shoot past the minimum and miss it! Therefore, it is better to start with a small learning rate.

Now, we are ready to update the weights.

Finally, we will illustrate the backpropagation algorithm by solving a very simple numerical example to clear up any ambiguities. Below is a small portion of a complex neural network and our goal is to update W5.

Assume we have already passed the input and reached Step 5, where you have computed the output of the network. The algorithm will now do the following:

a) Calculate the total error.

b) Calculate the gradient for W5. This calculation is kind of complex, so we will not go through it in detail.

Calculating the Gradient corresponding to the fifth weight

c) Assuming that the learning rate is equal to 0.5, we end up with this new, updated W5.

d) Repeat these steps for all the weights in the network until you reach the input layer.

e) Having completed updating the weights in the backward propagation step, you forward propagate new inputs and calculate the error to repeat and go through a new round of weight updates.

f) Keep repeating the forward and backward propagation steps until you get outputs which are near to the actual values, hence getting a total error that is 0 or below a threshold that you find acceptable. AND SUCCESS!

Fast Recap

I guess by now, after going through the blog, I can safely guess that you saw how impressive, powerful, and promising deep learning is.

Neural networks all have a basic structure: an input layer, hidden layer(s), and finally an output layer.

Inputs are added to each node, then they are passed through an activation function. We keep forward propagating until we reach the final output layer. The error is then calculated and back-propagation is applied in order to update the weights (they are updated in a way to minimize the total loss). You keep forward and back-propagating until you reach the desired output (with an acceptable minimum error).

So where can you go from here?

You can dive into a more detailed walkthrough of the backpropagation algorithm (we skipped the details of the gradient computation, remember?)
You can even try to build your FIRST deep learning project using the Keras deep learning library in Python.

Here’s to the future of deep learning!!