Create a Convolutional Neural Network with TensorFlow

Kyle Zeller
Sep 4, 2018 · 5 min read
Sample from MNST Dataset (Credit: https://www.researchgate.net/figure/A-random-sample-of-15-handwritten-digits-from-the-MNIST-data-set_fig2_282924675)

This portion of the blog requires a bit of prior knowledge on what neural networks are and how they function. Here’s two links (1 & 2) to a relatively comprehensive introduction to neural networks.

Brief Introduction to Convolutional Neural Networks

ConvNet (Credit: https://camo.githubusercontent.com/269e3903f62eb2c4d13ac4c9ab979510010f8968/68747470733a2f2f7261772e6769746875622e636f6d2f746176677265656e2f6c616e647573655f636c617373696669636174696f6e2f6d61737465722f66696c652f636e6e2e706e673f7261773d74727565)

Convolutional Neural Networks (CNNs) take a bit of inspiration from biological processes, primarily that of the visual cortex in the brain. A 1962 experiment led by Hubel and Wiesel, led to the discovery that certain individual neurons fired only in the presence of edges of a particular orientation and location given a visual stimulus. The experiment also entails the neurons having been layered in a columnar structure attributing to visual perception.

ConvNet Structure

In short, CNNs can be described as having a convolutional, pooling, and fully-connected layer. Each CNN layer transforms a 3D input volume to a 3D output volume of activations and the neurons in a CNN are arranged as (width, height, depth).

Neuron arrangement into three dimensions “depth, height, & depth” (Credit: http://cs231n.github.io/assets/cnn/cnn.jpeg)

Convolutional Layers

Each convolutional layer has a number of filters, with their own stride and padding, consisting of learnable filters to produce a 2D activation map for filters at every spatial position.

As a hyperparameter, depth represents the number of filters to use, each of which are trying to learn something different within the input. E.g. edges, color blobs, etc. The stride of a filter, represents by how much to move the filter across the input at a time (usually 1 or 2). The size of the zero-padding, “same” padding will bound the outer edges of the convolved image with 0’s in order to keep the dimensions of the image the same. Meanwhile, “valid” padding will not bound the convolved image with 0’s and thus reduces the dimensionality of the features.

Convolution Diagram (Credit: http://intellabs.github.io/RiverTrail/tutorial/)

To calculate the spatial size of the output volume, consider the following:

  • Let W be the input volume size
  • Let F be the receptive field size of the Conv Layer neurons
  • Let P be the amount of zero padding
  • Let S be the stride of the filter

Pooling Layers

The main function of the pooling layer is to reduce the number of features, while still retaining relevant information that might have been discarded during the dimensionality reduction.

Max Pooling Example (Credit: http://cs231n.github.io/convolutional-networks/#overview)

Fully Connected (Dense) Layer

The fully connected layer is used to associate features to a single class from a set of possible classes. As shown below, the preceding activation layer is fully connected to each neuron present in this layer and the output is a probability distribution for the likelihood of each class being correct.

Fully Connected Neural Network (Credit: https://upload.wikimedia.org/wikipedia/commons/thumb/4/46/Colored_neural_network.svg/1200px-Colored_neural_network.svg.png)

For a much more in depth tutorial on ConvNets, please refer to this link.

Implementation

This implementation of a ConvNet is for classifying data in the MIST dataset, which is comprised of gray-scale 28 x 28 pixel images of written digits [0–9].

More information on the dataset can be found here.

The Input Pipeline

  1. Extract
  2. Transform
  3. Load

The Model Architecture

Conv Layer 1:

  • 32 filters
  • Kernel of size 5 x 5 pixels
  • Stride of 1 pixel
  • “Same” padding
  • ReLU activation

Pooling Layer 1:

  • Max Pool Operation
  • Kernel of size 2 x 2 pixels
  • Stride of 2 pixels

Conv Layer 2 is the same as Conv Layer 1, except for there now being twice as many filters as before.

Pooling Layer 2, will have the exact same parameters as Pooling Layer 1.

Fully Connected Layer:

  • Reshape and flatten the output from the previous layer
  • Set the number of neurons desired and activation type
  • Use dropout if needed, to prevent overfitting

Model Evaluation

The full code for this part can be found here.

Running the Code Step by step

In setting up the code on your machine, I’ve provided a requirements document from which you can simply type the code in to the terminal. (Window’s users should use the URL provided in a browser for the python installation)

The code below covers the following:

  • Installing python2 or python3
  • Installing pip or pip3
  • Installing virtualenv
  • Installing all the needed dependencies within the “activated” virtual environment
  • Running the code and deactivating the virtual environment

Adding Extra Features:

The full code for this “second” part can be found here. (Please note that the part below cannot be done without saving the summaries of your model)

Visualizing The Results w/ TensorBoard

  • Reactivate the virtual environment
  • Run the following in the terminal:
tensorboard --logdir=path/to/log-directory

The repository for the code can be found here.

Kyle Zeller

Written by

Graduate in ECE, CS & Math. I currently lead my own research endeavor with Electroencephalograms and Machine Learning to learn more about the human brain.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade