Build Your First Neural Network Model From Scratch (With Code)

A beginner’s code-friendly approach to deep learning

Aditya Chakraborty
The Startup
6 min readSep 3, 2020

--

I’m strongly of the opinion that it is crucial to build projects if one really wants to learn deep learning. So if you are looking to start making your own deep learning projects, you’re in the right place. Stick around till the end of the tutorial and you’ll have your first deep learning project ever!

I’m going to show you a simple project that implements feed forward neural networks on the famous CIFAR-10 dataset, using PyTorch.

Check out my Github for the full source code:

Prerequisites

  • A surface-level knowledge of neural networks is required for following along with this tutorial.
  • Familiarity with Python would be helpful.

About the dataset

According to the official website of CIFAR-10 dataset:

“The CIFAR-10 dataset consists of 60000 32x32 colour images in 10 classes, with 6000 images per class. There are 50000 training images and 10000 test images.The dataset is divided into five training batches and one test batch, each with 10000 images. The test batch contains exactly 1000 randomly-selected images from each class. The training batches contain the remaining images in random order, but some training batches may contain more images from one class than another. Between them, the training batches contain exactly 5000 images from each class.”

Source : https://www.cs.toronto.edu/~kriz/cifar.html

How it looks :

Source

So this is basically a multi-class classification problem where our neural network is going to take in an input image and predict which class that input image belongs to among the 10 classes shown above.

Let’s get to coding !

We’ll start by importing all the necessary packages:

Now we’ll have to download the dataset. PyTorch’s torchvision already provides us with a bunch of datasets including the Cifar10 dataset that we will be using here. So, we’ll download the train and test datasets separately from torchvision.datasets.

Notice that we transform our dataset of images into tensors using -

transform=ToTensor()

Now that we have our dataset ready, we can perform some data analysis.

We can also visualise a random image from the dataset as:

# Output:
Label (numeric): 7
Label (textual): horse

The image is very blurred and it is hard even for a human eye to say for certain what the image contains.

Let’s split our dataset into training and validation datasets.

Now let’s set a batch size (which is a hyperparameter) and visualise a single batch of images.

# Output:
images.shape: torch.Size([128, 3, 32, 32])

There are a few hyperparameters used such as batch_size, shuffle, num_workers and pin_memory. You can play around with these values and check what works best for you.

At this point, we’ll start building our image classification base model that will contain all the helper functions that we’ll need and also the fit function that will be called in the training phase.

Let’s also make sure that our GPU is enabled and it is used while training.

Some more helper functions:

Now the training model :

As it is clear, we have 4 feed forward linear layers with layers : 10 — >2020, 2020 — >505, 505 — >125 & 125 — >10. Then in the forward method, after each linear layer, a ReLU activation function is used.

#Output : 
[{'val_acc': 0.1067095622420311, 'val_loss': 2.3030261993408203}]

Now, we’ll train the model using the fit function to reduce the validation loss & improve accuracy. We’ll be experimenting with different no. of epochs and learning rates to achieve the best accuracy possible.

Let’s start with a fairly high learning rate and a high no. of epochs to allow our model to explore the nature of the network. It is always a good idea to start with a high learning rate so that we can get a good idea of the network as a whole and based on that, we could plan our next steps. Eventually, we’ll lower down the learning rate for faster convergence.

Output:

Epoch [0], val_loss: 1.9024, val_acc: 0.3116
Epoch [1], val_loss: 1.9146, val_acc: 0.3141
Epoch [2], val_loss: 1.7534, val_acc: 0.3651
Epoch [3], val_loss: 1.6731, val_acc: 0.3953
Epoch [4], val_loss: 1.6696, val_acc: 0.3959
Epoch [5], val_loss: 1.6071, val_acc: 0.4267
Epoch [6], val_loss: 1.6354, val_acc: 0.4146
Epoch [7], val_loss: 1.6794, val_acc: 0.4037
Epoch [8], val_loss: 1.5566, val_acc: 0.4397
Epoch [9], val_loss: 1.5156, val_acc: 0.4454
Epoch [10], val_loss: 1.5073, val_acc: 0.4579
Epoch [11], val_loss: 1.5743, val_acc: 0.4325
Epoch [12], val_loss: 1.4422, val_acc: 0.4819
Epoch [13], val_loss: 1.4832, val_acc: 0.4732
Epoch [14], val_loss: 1.4635, val_acc: 0.4793
Epoch [15], val_loss: 1.4615, val_acc: 0.4825
Epoch [16], val_loss: 1.4626, val_acc: 0.4765
Epoch [17], val_loss: 1.6189, val_acc: 0.4308
Epoch [18], val_loss: 1.4110, val_acc: 0.5007
Epoch [19], val_loss: 1.4181, val_acc: 0.5061
Epoch [20], val_loss: 1.5151, val_acc: 0.4843
Epoch [21], val_loss: 1.5199, val_acc: 0.4658
Epoch [22], val_loss: 1.4862, val_acc: 0.4893
Epoch [23], val_loss: 1.4905, val_acc: 0.4982
Epoch [24], val_loss: 1.3613, val_acc: 0.5252
Epoch [25], val_loss: 1.4618, val_acc: 0.5094
Epoch [26], val_loss: 1.3896, val_acc: 0.5219
Epoch [27], val_loss: 1.4939, val_acc: 0.5064
Epoch [28], val_loss: 1.4625, val_acc: 0.4968
Epoch [29], val_loss: 1.4153, val_acc: 0.5235
Epoch [30], val_loss: 1.4081, val_acc: 0.5321
Epoch [31], val_loss: 1.5119, val_acc: 0.5188
Epoch [32], val_loss: 1.7090, val_acc: 0.4417
Epoch [33], val_loss: 1.5500, val_acc: 0.4975
Epoch [34], val_loss: 1.5853, val_acc: 0.5010
Epoch [35], val_loss: 1.5117, val_acc: 0.5197
Epoch [36], val_loss: 1.5295, val_acc: 0.5257
Epoch [37], val_loss: 1.6850, val_acc: 0.5104
Epoch [38], val_loss: 1.7531, val_acc: 0.4895
Epoch [39], val_loss: 1.6858, val_acc: 0.5139
Epoch [40], val_loss: 1.7323, val_acc: 0.5090
Epoch [41], val_loss: 1.6911, val_acc: 0.5080
Epoch [42], val_loss: 2.0147, val_acc: 0.4776
Epoch [43], val_loss: 1.8985, val_acc: 0.4761
Epoch [44], val_loss: 1.7531, val_acc: 0.5275

We can see that our model started from 31% and ended up at about 52% validation accuracy, with a significant decrease in validation loss as well. Now, we can gradually lower the learning rate so that our model finds the global minima and converges to it faster.

Output:

Epoch [0], val_loss: 1.6501, val_acc: 0.5639
Epoch [1], val_loss: 1.6819, val_acc: 0.5683
Epoch [2], val_loss: 1.7120, val_acc: 0.5721
Epoch [3], val_loss: 1.7336, val_acc: 0.5673
Epoch [4], val_loss: 1.7580, val_acc: 0.5670
Epoch [0], val_loss: 1.7671, val_acc: 0.5703
Epoch [1], val_loss: 1.7728, val_acc: 0.5695
Epoch [2], val_loss: 1.7765, val_acc: 0.5688
Epoch [3], val_loss: 1.7908, val_acc: 0.5666
Epoch [4], val_loss: 1.7943, val_acc: 0.5701

You can go on with this and check if you can go higher than 57% validation accuracy. I’m going to stop right here and plot my graphs.

Output:

Output:

Clearly, both loss and accuracy comes to a level constant as the epochs increase upto a certain limit. So we can say that our model has reached its computational limit and the learnable parameters have converged.

# Output : 
{'val_acc': 0.571093738079071, 'val_loss': 1.7442058324813843}

Conclusion

We performed -

  • CIFAR-10 dataset exploration
  • Data analysis on the dataset
  • Visualisations of single images and batches
  • Building image classification base model
  • Creating training model architecture with feed forward neural networks
  • Model training over multiple epochs with hyperparameter tunings
  • Evaluating model’s performance using our accuracy function and cross entropy loss function.
  • Plotting accuracy and loss graphs for visualising the results.

Further from here

This was a basic feed forward network model which could only go upto 57% validation accuracy. For the next project, you could try adding some convolutional neural networks instead of plain feed forward neural networks to check how the accuracy increases. Even better, try adding some regularisations in the CNN model architecture to achieve state-of-the-art results on CIFAR-10 dataset.

--

--

Aditya Chakraborty
The Startup

Aspiring Data Scientist | Neural Networks enthusiast