Implementing Convolutional Neural Network using TensorFlow for Fashion MNIST

In this post we will use Fashion MNIST dataset to build a CNN model using TensorFlow. I will also mention how I improved the model to change the accuracy of the model from 29% to 90%


CNN Basics

TensorFlow Basics

Steps for building CNN using TensorFlow

  1. Import required libraries
  2. Load the dataset for training and evaluation
  3. Analyze the dataset
  4. Normalize the dataset for inputting into CNN
  5. Build the CNN model
  6. Create the estimator
  7. Train the model
  8. Evaluate the model
  9. Improve the accuracy of the model

Importing required libraries

import tensorflow as tf
import keras
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
%matplotlib inline

Loading the Fashion MNIST dataset

We load the dataset from the keras dataset. Dataset contains fashion images of clothing items, and accessories. From the dataset we create training and evaluation dataset

((train_data, train_labels),
(eval_data, eval_labels)) = tf.keras.datasets.fashion_mnist.load_data()

Fashion images are the inputs and target variable are 10 classes of different clothing items including accessories

target_dict = {
0: 'T-shirt/top',
1: 'Trouser',
2: 'Pullover',
3: 'Dress',
4: 'Coat',
5: 'Sandal',
6: 'Shirt',
7: 'Sneaker',
8: 'Bag',
9: 'Ankle boot',

Analyzing the dataset

Let’s check the shape of the training and evaluation images


Training dataset contain 60,000 images and test or evaluation dataset contain 10,000 images

Let’s plot the first 20 images along with with their labels

for i in range(0,20):
plt.subplot(5,5, i+1)
plt.imshow(train_data[i] )
plt.title( target_dict[(train_labels[i]) ])

Normalizing the Dataset

We normalize the input data so that they are all on the same scale

train_data = train_data/np.float32(255)
train_labels = train_labels.astype(np.int32)
eval_data = eval_data/np.float32(255)
eval_labels = eval_labels.astype(np.int32)

Building the CNN Model

We now build the CNN model which is the most interesting part.

Reshaping the input

We need to reshape the input and it should be of size

[batch_size, image_height, image_width, channels]
-1 for batch size implies that dimension should be dynamically computed based on the number of input values in features holding the size of all other dimensions constant.

Overview of the model

First Convolutional layer has 32 feature detectors of 5 by 5 to which we apply a max pooling.

Max pool layer 1 is the input to the second convolutional layer to which we apply 64 filters or feature detectors and then apply max pooling. Here we apply a 25% dropout.

Max pool layer 2 acts as an input to the third convolutional layers with 128 feature detectors and then we again apply max pool. Here we apply a 25% dropout.

We flatten the structure to create a dense layer of 1024 units to which we apply a drop out rate of 40% .

Finally we have our output layer of 10 units to classify the 10 clothing and accessory items. As this a multiclass classification problem we use Softmax activation function. Output layer will return the raw values for the predictions. To predict the class, we will find the largest value from the output layer tensord using tf.argmax()

We use Adam optimizer for training with a learning rate of 0.001

Based on the mode which can be Training, Predict or Evaluate we print different metrics

Architecture of the CNN
def cnn_model(features, labels, mode):
#Reshapinng the input
input_layer = tf.reshape(features["x"], [-1, 28, 28, 1])

# Convolutional Layer #1 and Pooling Layer #1
conv1 = tf.layers.conv2d(
kernel_size=[5, 5],

pool1 = tf.layers.max_pooling2d(inputs=conv1, pool_size=[2, 2], strides=2)

# Convolutional Layer #2 and Pooling Layer #2
conv2 = tf.layers.conv2d(
kernel_size=[5, 5],

pool2 = tf.layers.max_pooling2d(inputs=conv2, pool_size=[2, 2], strides=2)

dropout_1 = tf.layers.dropout(inputs=pool2, rate=0.25,training=mode == tf.estimator.ModeKeys.TRAIN )

# Convolutional Layer #2 and Pooling Layer #2
conv3 = tf.layers.conv2d(
kernel_size=[5, 5],


pool3 = tf.layers.max_pooling2d(inputs=conv3, pool_size=[2, 2], strides=2)

dropout_2 = tf.layers.dropout(inputs=pool3, rate=0.25,training=mode == tf.estimator.ModeKeys.TRAIN )

flatten_1= tf.reshape(dropout_2, [-1, 3*3*128])

dense = tf.layers.dense(inputs= flatten_1,units=1024,activation=tf.nn.relu)

dropout= tf.layers.dropout(inputs=dense, rate=0.4, training=mode == tf.estimator.ModeKeys.TRAIN)

output_layer = tf.layers.dense(inputs= dropout, units=10)
"classes":tf.argmax(input=output_layer, axis=1),

if mode==tf.estimator.ModeKeys.PREDICT:
return tf.estimator.EstimatorSpec(mode=mode, predictions=predictions)

loss= tf.losses.sparse_softmax_cross_entropy(labels=labels, logits= output_layer, scope='loss')

if mode== tf.estimator.ModeKeys.TRAIN:
optimizer= tf.train.AdamOptimizer(learning_rate=0.001)
train_op= optimizer.minimize(loss=loss, global_step=tf.train.get_global_step())
return tf.estimator.EstimatorSpec(mode=mode, loss=loss,train_op=train_op )

eval_metrics_op={ "accuracy":tf.metrics.accuracy(labels=labels,predictions=predictions["classes"])}
return tf.estimator.EstimatorSpec(mode=mode, loss=loss, eval_metric_ops=eval_metrics_op)

Creating the Estimator

We create the estimator to which we pass the CNN model we created above. We will use this model for training, prediction and evaluation

fashion_classifier = tf.estimator.Estimator(model_fn = cnn_model)

Training the model

To train the model we will create the train_input_fn and then we call the train method to classify the fashion mnist dataset

For the input, we use numpy_input_fn method to which we pass the training input feature data, x and labels, train_labels respectively.

We use batch_size of 100 meaning we will train on mini batches of 100 examples at each step.

num_epochs= None implies that the model will train until the specified number of steps is reached. In our example we set it to 1500 steps.

We also want to shuffle the training data.

# Train the model
train_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": train_data},
fashion_classifier.train(input_fn=train_input_fn, steps=1500)

Evaluating the model

Once the model is trained we will evaluate the model using the eval_data and eval_lables that we created earlier using evaluate function. This will helps us determine its the accuracy of the model on the fashion mnist dataset.

We want to evaluate the model using only one epoch and we do not shuffle the data as we want to iterate over the data sequentially.

eval_input_fn = tf.estimator.inputs.numpy_input_fn(
x={"x": eval_data},
eval_results = fashion_classifier.evaluate(input_fn=eval_input_fn)

We see accuracy of almost 90%.

Steps to reach this accuracy

I initially started with Gradient Descent Optimizer and my initial accuracy was 29%.

When I changed the Optimizer to Adam accuracy changed from 29 % to 69%.

when added one more convolutional layers and pooling layers, I got an accuracy of 73%

I then increased steps from 1000 to 1500 accuracy increased from to 82%.

I added additional dropouts dropout_1 and dropout_2 and then my accuracy ranged from 89% to 90%.

Generously Clap if you liked the article!