Fully Connected vs Convolutional Neural Networks

Implementation using Keras

Pooja Mahajan
The Startup
4 min readOct 23, 2020

--

In this post, we will cover the differences between a Fully connected neural network and a Convolutional neural network. We will focus on understanding the differences in terms of the model architecture and results obtained on the MNIST dataset.

Fully connected neural network

  • A fully connected neural network consists of a series of fully connected layers that connect every neuron in one layer to every neuron in the other layer.
  • The major advantage of fully connected networks is that they are “structure agnostic” i.e. there are no special assumptions needed to be made about the input.
  • While being structure agnostic makes fully connected networks very broadly applicable, such networks do tend to have weaker performance than special-purpose networks tuned to the structure of a problem space.
Multilayer Deep Fully Connected Network, Image Source

Convolutional Neural Network

  • CNN architectures make the explicit assumption that the inputs are images, which allows encoding certain properties into the model architecture.
  • A simple CNN is a sequence of layers, and every layer of a CNN transforms one volume of activations to another through a differentiable function. Three main types of layers are used to build CNN architecture: Convolutional Layer, Pooling Layer, and Fully-Connected Layer.

To know more about the basic fundamentals related to CNN, check out my earlier blogs on Convolutions and Pooling.

Simple Convolutional architecture, Image Source

Dataset Used

  • MNIST (Modified National Institute of Standards and Technology database) dataset of 60,000 28x28 grayscale images of the 10 digits, along with a test set of 10,000 images.
  • It is a subset of a larger set available from NIST. The digits have been size-normalized and centered in a fixed-size image.
  • It is a good database for people who want to try learning techniques and pattern recognition methods on real-world data while spending minimal efforts on preprocessing and formatting.

Model Implementation

A) Using Fully Connected Neural Network Architecture

  • Model Architecture

For the fully-connected architecture, I have used a total of three hidden layers with ‘relu’ activation function apart from input and output layers.

  • Model Summary

The total number of trainable parameters is around 0.3 million. In a fully-connected layer, for n inputs and m outputs, the number of weights is n*m. Additionally, you have a bias for each output node, so total (n+1)*m parameters.

  • Model Accuracy

On training the fully connected model for five epochs with a batch size of 128, and validation split value set to 0.3 we got training accuracy of 98.6% and validation accuracy of 96.07%. Moreover, after 2nd epoch, we can visualize how train and validation accuracy tends to move wide apart.

  • Accuracy on Test data

On test data with 10,000 images accuracy for the fully connected neural network is 96%.

B) Using Convolutional Neural Network Architecture

  • Model Architecture

For Convolutional Neural network architecture, we added 3 convolutional layers with activation as ‘relu’ and a max pool layer after the first convolutional layer.

  • Model Summary

With CNN the differences you can notice in summary are Output shape and number of parameters. As compared to the fully connected neural network model the total number of parameters is too less i.e. 0.1 million.

  • Model Accuracy

On training, CNN for five epochs for a batch size of 128, and validation split value set to 0.3 we got training accuracy of 99.19% and validation accuracy of 99.63%. Moreover, unlike the fully connected model, we can visualize train and validation accuracy do not tend to move as wide apart.

  • Accuracy on the Test dataset

On test data with 10,000 images, accuracy for the fully connected neural network is 98.9%.

Final Thoughts

Although fully connected networks make no assumptions about the input they tend to perform less and aren’t good for feature extraction. Plus they have a higher number of weights to train that results in high training time while on the other hand CNNs are trained to identify and extract the best features from the images for the problem at hand with relatively fewer parameters to train.

Please find the relevant codes used in this blog here. On similar lines, you can find the implementation of CNNs on the FMNIST dataset using PyTorch here.

References :

--

--