The “hello, world” of Deep Learning

Kaneel Senevirathne
Analytics Vidhya
Published in
8 min readAug 20, 2021

The “hello, world” program in computer science is traditionally used to introduce novice learners to a programming language. The program basically is one line of code that prints “hello, world”. Furthermore, the “hello, world” program has been extended to other programming domains as well. For instance, when introducing web development to new learners, the phrase “hello, world” is often used as the headline of your first website.

At this point, I am pretty sure that you are aware why I named this article “The ‘hello, world’ of deep learning”. Although, we do not use the phrase “hello, world” from now onwards, this article is meant to gently introduce the basics of Deep Learning to novice learners by creating a simple neural network from scratch using python.

What is deep learning…

The term deep learning refers to training a neural network. So, what’s a neural network? Let’s look at the example below.

Example data (red) and a quadratic fit (blue)

Let’s say we have collected some data ‘x’ and ‘y’ (red dots) and we want a mathematical model that predicts ‘y’ given ‘x’. So what we want to do here is, create a model that inputs ‘x’ values and outputs estimates of ‘y’ values. There are multiple ways to do this. Here, we are using a neural network.

An example of a neural network

A neural network is a subset of machine learning algorithms that is comprised of an input layer, layers and layers of neurons (represented by circles in the figure above) and an output layer. Basically, we input some values to the neural network. Then the neurons of the network performs some calculations and outputs some values. We’ll talk about how these neurons are trained later in this article.

Now, we can represent our example by a neural network with a single neuron that takes in the ‘x’ values and calculates the ‘y’ values. If we closely look at the data (red dots) we can see they look like a quadratic function (blue line). So if we want to predict the ‘y’ values, we could use a quadratic equation to represent the neuron and let the neural network learn the parameters of the quadratic function. In this scenario, the neuron takes in an ‘x’ value and use the learnt quadratic function to predict the ‘y’ value.

A neural network with a single neuron

Coding and training a neural network from scratch…

Now that we somewhat have an idea of what a neural network is, let’s try to synthesize some data using a quadratic function. Then we will train a single neuron neural network to learn the parameters of the quadratic model and make some predictions. If you want to go and check the full code, you can click here.

First let’s create a class that represents the neuron of our model.

#build the model class
class Model():

def __init__(self):

#assign the initial variables
self.a = 15
self.b = -2
self.c = 10

def __call__(self, x):

return self.a * (x)**2 + self.b * x + self.c

This is the operation our neuron performs. It takes a value ‘x’ and then use the quadratic equation (a x² + b x + c) to calculate the ‘y’ value. Note that the initial values of our model are 15, -2 and 10 for a, b and c respectively.

Now let’s create some synthetic data. We will use 10, 5 and 12 as the true values for a, b and c. So the goal of our neural network is to learn these true a, b and c values.

#synthetic data
true_a = 10
true_b = 5
true_c = 12
#number of examples
num_examples = 1000
#create x and y data
x = np.random.normal(0, 1, num_examples)
y = true_a * np.square(x) + true_b * x + true_c
True and predicted data from our model

The figure above shows the predicted data (generated by our initial values) and the real data (generated by our true parameter values). Ideally, once the neural network is trained, the predicted plot (red) should be on top of the real plot (blue). Now let’s see how we train the neural network to learn the parameters of the model.

As mentioned above the predicted data are generated from the initial values. The goal is to find the parameters a, b and c that gives the predicted values closest to the real values. In order to do this, we measure how different our predicted values are from the real values. This measurement is often called the loss value and the function that calculates this is called the loss function. There are multiple loss functions, but in our model we will use the “mean squared error (mse)”. The goal of all deep learning algorithms is to minimize this loss value. In other words reduce the difference between the real values and the predicted values.

Formula to calculate the mse
#create a loss function. we will use mean squared error
def loss(y_predicted, y_true):

return np.mean(np.square(y_predicted - y_true))

Once we have the loss, we have to find how the parameters of our model change with this loss function. In order to do this we have to find the derivatives of our parameters with respect to the loss function. (If you’d like to check out how to derive the following equations, go to the my Jupyter Notebook).

Derivatives of the parameters w.r.t the loss function
#let's calculate the derivativess w.r.t the loss
def grads(inputs, outputs, predictions):

#calculate dL/da (calling it da)
da = np.dot(np.square(inputs).T, 2 * (predictions - outputs))
da = da/len(inputs)

#calculate dL/db (calling it db)
db = np.dot(inputs.T, 2*(predictions - outputs))
db = db/len(inputs)

#calculate dL/dc (calling it dc)
dc = np.sum(predictions - outputs)
dc = dc/len(inputs)

return da, db, dc

After calculating the derivatives, we now update our parameter values by subtracting the derivatives from our most recent estimates of the parameters. Note that we also have to multiply the derivative by a constant value called the learning rate (α) before updating the parameters. This learning rate is a hyperparameter of the model that controls the rate or speed the model learns.

Update the parameters
#now let's create a function that fits the model given the model parameters, inputs and outputs to the datadef fit(model, inputs, outputs, learning_rate):

#calculate the current loss using the loss function
current_loss = loss(outputs, model(inputs))

#calculate the gradients for each variable in the model
da, db, dc = grads(inputs, outputs, model(inputs))

#update the parameters of the model
model.a = model.a - learning_rate * da
model.b = model.b - learning_rate * db
model.c = model.c - learning_rate * dc

return current_loss

Finally we have all the required functions to train the model. Now, let’s train our model 50 iterations/epochs to see the results. Note that in each iteration, the model will first predict the ‘y’ values using the current estimates of the parameters. Then it will calculate the loss value and update the parameter values based on the calculated derivatives. We can check if the model is actually learning by keeping track of the loss value w.r.t the epoch number. If the model is learning the loss value will decrease when the epochs increase.

#initialize the model
model = Model()
#number of epochs
epochs = 50
#get a b c and losses
list_a, list_b, list_c = [], [], []
losses = []
#train the model for 50 epochs
for epoch in range(epochs):

list_a.append(model.a)
list_b.append(model.b)
list_c.append(model.c)

current_loss = fit(model, x, y, 0.1)
losses.append(current_loss)

if epoch % 5 == 0:

print('epoch {} --------> mean squared error {}'.format(epoch, np.round(current_loss, 4)))
print('Real and predicted values after epoch {}'.format(epoch))
plot_data(x, y, model(x), figsize = (4, 2))

Below is a plot of the predicted and real data after 0 and 50 epochs. As we expected the predicted (red) data is on top of the real data at the end of training.

Predicted vs real data after 0 (top) 50 (bottom) epochs

Let’s further see how our model has performed.

In this figure you can see how the loss is decreasing when the epoch number increases. This is a sign that our model is learning. The decreasing loss shows that the difference between the real and the predicted values are becoming smaller and smaller after each epoch.

True values and the change of parameters w.r.t epoch number

Here we see how the predicted parameter values converging to the real values when the epoch number increases. All three parameters are converging to the real values the data were generated from (10, 5 and 12 for a, b and c respectively).

In an additional note, the algorithm we used to find the local minimum of the loss function is called “gradient descent”. When we subtract each parameter with its gradient iteratively, the parameter value converges to the optimal value. If we look at the figure below, the light blue surface is the loss given different values of b and c (Note that we have only b and c in this figure. The a value in this case is kept a constant for illustration purposes). The red dots are b and c values and their corresponding loss at each epoch. We can see that the initial values for b and c gives a higher loss. As the epochs increase b and c values converge to their optimal values while the loss converges to the local minimum.

Gradient descent converging to the local minimum of the loss function

Conclusion..

In this article, we created a neural network from scratch using python. We used a simple network architecture with one neuron to estimate the parameters of a quadratic function. We first synthesized the data, created functions to calculate the loss, the gradients of the parameters, update the parameters and finally used 50 epochs to train the model. We saw how the loss decreased with increasing epochs and how the parameters converged to their true values.

When I started learning about Deep Learning & Neural Networks, I found building this simple model very helpful before moving forward to more advanced techniques and algorithms. So if you are a novice learner and reading this, I hope this helped you to understand the basics of Deep Learning as well.

Hope you enjoyed the article. Thanks for reading!

--

--