A Beginner’s Guide to Shallow Neural Network

Published in

cvrs.notepad

6 min readJun 28, 2020

Knowing the crux of how neural networks work always has its perks. Although tools such as TensorFlow and scikit have made this very easy for us to build our model in just a few amounts of time, understanding it’s backend lets you appreciate the complexity behind it. And obviously you would learn a lot.

That being said, we need not dive straight into deep neural networks. Understanding a shallow neural network gives us an idea of what is happening inside deep neural networks.

In this article, we will walk you through building a model that classifies whether it’s a day or night in the image scenery. We have used underlying concepts of deep learning and built a shallow neural network from scratch. Let’s dive in.

What is a Neuron?

Let’s get started with knowing about the most basic unit of a neural network.

As you might have guessed, an artificial “neuron” is modeled using the architecture of the human biological neuron, forming the driving unit of a neural network structure. It receives inputs(X). It then processes those inputs using a transformation and activation function and voila! We have our desired output(y).

Structure of Shallow Neural Networks

You must be wondering, we have the neuron, well and good. But, how do we arrange it? How many of them do we need?

Each neural network consists of 3 types of layers that are responsible for its functioning. These 3 layers are:

Input layer — All the inputs of the model are fed into this layer. In our model, we have taken the number of nodes in this layer to be 12288 owing to the data set size — 64*64*3
Hidden layer -This is where all the action takes place, inputs are fed, processed with the help of linear and nonlinear functions, and are passed to the output layer. Here we have taken the number of nodes in the hidden layer to be 25.
Output layer — After the processing, the data is sent to the output layer. In this classification problem, we would need only 1 output node as the result needs to be either ‘Day’ — 0 or ‘Night’ — 1.

It is one of the most difficult tasks to decide how many hidden layers you should choose and how many nodes are to be there in each of them. Naturally, it is not an easy task, and playing with these numbers can either give you very good results or might land you up with very inaccurate results. Tread carefully!

Creating your own dataframe

Here comes the best part of this project. If you start looking for datasets online, hold up, why would you do that?

Simply take out your smartphone; go outdoors, admire nature, and record some videos both during day and night; and use VLC to extract snaps from those videos.

There are some other hi-tech tools such as AutoML that you can use too.

Then you need to process those images for training. You can check our code “here” for the same.

Initialize parameters

There are various different approaches to initialize parameters. There are many arguments regarding initializing parameters to zero (or same value) but there’s one caveat to it. Let’s understand this using Shallow Neural Networks itself.

Let W[1] and W[2] be the weight matrices of layer 1 and layer 2 respectively. Here we are assuming the biases, b[1] and b[2] to be zero. Now, if W[1] and W[2] are initialized to zero or the same value, then every neuron in the hidden layer will get the exact same input thus having the same activation value. Consequently, it will fail to “break symmetry”. The recommended method is “Xavier Initialization” or you can use this code snippet:

Forward propagation

Forward propagation refers to calculating and storing intermediate values. This can be divided into two halves: linear and activation.

Linear half:

Activation half:

Here, g[l] denotes the activation function of layer ‘l’. The most commonly used activation functions are relu, leaky relu, sigmoid, tanh, and softmax. For our project, we have used ‘relu function’ for the first layer and ‘sigmoid function for the final layer. You can try different functions for the first layer but for the final layer, ‘sigmoid function’ is recommended while classifying between two classes.

I’d end the discussion here as ‘activation functions’ call for a separate article itself.

Compute cost

Once you are done with calculating A[2], it’s time to check how accurate your model is. This can be done by computing cost using cross-entropy loss function:

Backward propagation

Now, let’s get started with understanding the brains behind deciphering the apt values of our parameters. In other words, it’s time to refresh your 12th-grade mathematics and place those concepts right here where you’ll actually find the answer to “where in my life am I going to use these mathematical formulas?”

To list everything down, we have our parameters, we have the results of non-linear activation and linear transformation functions, so we have our loss.

What now?

Firstly, you should familiarize yourself with the gradient descent formula, using which we’ll be updating the parameters so that we get a close to zero loss.

To calculate dL/dW, you must first realize that loss is a function of y which is in turn a function of our parameters. Hence we’ll be applying the infamous chain rule to find the above value.

You already have the formula to calculate Z[l]. To calculate the derivative of Z[l], just apply the chain rule:

To get dA[𝑙], you just need to find the derivatives of the activation functions used. Using the above formula, keep calculating the delta values until you reach the first layer — there you have it — you have finally “backpropagated” from where you began!

Update parameters

You’re almost there! Now the only thing left to do is update the parameter using the below-given formula. Note that the assignment happens simultaneously for all the parameters W[1], W[2], b[1] and b[2]