A Deep Neural Network In Tensorflow

Nishank Sharma
the ML blog
Published in
7 min readMay 26, 2018

Hello guys,

It’s Nishank here, welcome to clickbait. Today we will build a Deep Neural Networks using Tensorflow . We will start by building some basic fundamentals and then will gradually move towards building our neural net. Let’s get started!

Don’t forget to read the previous post on Getting Started With Tensorflow!

Show your support by subscribing to our newsletter!

Neural Network, Layers, Weights and Biases

A neural network or an Artificial Neural Network (ANN) is an interconnected collection of neurons which are aligned in such a way that they perform a mathematical function and helps us to solve a problem by learning and observing training cases to predict some output for test cases.

A neural network can be made up of n number of layers. There are 3 types of layers in a neural network-

  1. Input layer - As it is evident from the name this layer takes input and computer the output accordingly.
  2. Hidden layer - This is a type of layer in between input and output layer whose functioning is unknown hence the name hidden, duh!
  3. Output layer - This is the last layer of any neural network which gives the output according to the input provided at the beginning.
Credits — https://analyticsindiamag.com/

A neural network with multiple hidden layers is called a Deep Neural Network. We will be making one using tensorflow in this post.

Weights and Biases

So the way a simple neuron works is that it computes an output using the input and a given equation characteristic of that neuron. If that computed output is more than a threshold, the neuron fires.

Let’s see how a simple output function looks like-

So what is happening here is that every input X with a corresponding weight W and then summing the product of all the X and W to obtain the output Y. Now what you have to understand about weights is that there value represent how important an input X is towards predicting the output Y. And over the training process we will keep changing the wights of different inputs as we keep learning the importance towards getting the right output Y.

Next is bias. Bias is the neural network is equation is the extra neuron with value 1 in a neural network. Think of bias as a moderator for the output or a variable that can be used to shape the way we want our output to be. That’s all you have to know about bias for now.

Also we will be using backpropagation algorithm in our model to train the wights. Backpropagation works on a principle that error is calculated in backward direction from output layer to the input layer. I will write a detailed post about backpropogation in the coming weeks so don’t forget to subscribe to our newsletter to keep informed.

Alright enough of the theory, let the adventure begin!

The Model

We will start by importing the required libraries and dataset

Imports

We imported tensorflow (because duh!) and MNIST dataset which is a collection of features to recognize handwritten digits. Notice that we have use the one_hot parameter as True this means that all the 10 digits (0–9) will be represented by 1 as on and rest all as off or 0 in a 10 digits list. For example,

onehot notation

Certainly not a very hot way to represent things, but what can I say, it works!

Moving on, next we will construct our hidden layers, output layer and batch size.

Layers

We are using 3 hidden layers with 1000 neurons each and the output layer has 10 neurons which represent the outputs (0–9) that we hope to predict and taking the batch size as 100.

Although we can train the entire model at once, it makes more sense to train using batches of inputs as it is more efficient when using backpropagation algorithm. Also when using systems with low computation capabilities and bigger datasets, it is a wise to use batches for training.

Okay, now that preliminaries are out of the way, let’s focus on the structure of our model. Our model will be divided into two different steps-

1. Building The Neural Network

Building the neural network is itself divided into initializing the weights and biases and performing the mathematical calculation of sum of product of all inputs X with their corresponding weights W.

nnmodel

Whoa! That’s too much to handle all at once! Let’s tackle it line by line.

Line 1–3 : Handles the declaration of placeholders for input and output variables x and y respectively, which will be used in the neural network.

Line 4 -17: This part of our codes handles the random initializations of weights and biases for the layers of our neural network.

Notice how each layer is being fed into the weight intialization of the previous layer. This is because the number of weights should be equal to inputs of that layer which are nothing but outputs of the previous layer.

There is nothing special in initializing the weights and biases as they are being initialized by random garbage values, which will be adjusted in the training phase of our network.

Line 18–29: This is where most of the mathematical calculation is happening. For every layer, the first part is sum of products of weights with inputs and then with the bias neuron.

The second part is where things get interesting. We are using Rectified Linear Units (ReLU) is an activation function for each layer.

ReLU

ReLU is one of the many activation function used in Machine Learning. It basically gives us two types of values. 0 for all inputs less than 0 and a continuous real number for all inputs greater than 0. It is a widely used activation function and is preferred over sigmoid function because it gives better results in training.

You can read about ReLU and many other functions by this awesome post by Towards Data Science.

2. Training

Once our model is done, we move on to the training phase of our network.

training

Again, let’s try to make sense of things line by line:

Line 1–3: We have just defined the train() function and got our prediction by calling the nnmodel() function and passing our input x.

Line 4 -6: Here the focus is on calculating the cost by using the softmax() function and using logits, which mean the function operates unscaled input of previous layer (Forget it, if it got over your head).

The calculated cost is then optimized using the Adaptive Moment Estimation (Adam) optimizer, which is an extension to stochastic gradient descent algorithm.

You can read this great post by Machine Learning Mastery to understand Adam optimizer deeply.

Line 7–13: Now we will declare the number of epochs or the number of times the weights will be updated to get a respectable accuracy. We have set this to 15.

Next we will start the tensorflow session using tf.Session() function and initialize all the global variables. We will also print the number of total epochs for the understanding of the user.

Line 14–22: This is the meat of the training function. Using a for loop for number of epochs, we take mnist training set batch by batch.

Next we initialize epoch_x and epoch_y variables by test cases and outputs respectively iterated throughout the batch. These are then used to calculate loss for that epoch using the cost function and optimizer we declared in line 4–6.

The total loss for that epoch is obtained by repeatedly adding the loss for each training case to a epoch_loss variable which was initialized by 0 at the start of the loop. This loss is then printed for every epoch.

Line 23–28: Number of correct predictions are calculated by equating the prediction variable obtained by nnmodel() function with the output variable y obtained from the mnist dataset.

Accuracy for the model is computed and printed using the correct variable and the mnist dataset images and labels. Finally the train() function is called.

After the program is run in the terminal, we get an output which looks like this-

output

Notice how the loss increases and decreases with the epochs which is indicative of how the model is being trained. We get a final accuracy of 96.25% which is pretty sweet for our first model!

Alright then. this was how you make a Deep Neural Network in tensorflow! We will dive deep into building different models and doing many cool things using tensorflow in the coming posts.

All the code used in this post can be found by clicking the banner below. I will be posting the third post to the series soon. Stay tuned and subscribe to our newsletter for an awesome experience and never missing an update.

--

--

Nishank Sharma
the ML blog

Hello, I’m Nishank. I design beautiful, usable and enjoyable interfaces.