Multi-layer Perceptron using Keras on MNIST dataset for Digit Classification

ReLu activation + Dropout + BatchNormalization + AdamOptimizer

Rana singh

Published in

Analytics Vidhya

6 min readSep 13, 2019

Loading MNIST dataset

Every MNIST data point has two parts: an image of a handwritten digit and a corresponding label. We’ll call the images “x” and the labels “y”. Both the training set and test set contain images and their corresponding labels; for example, the training images are mnist.train.images and the training labels are mnist.train.labels.

# the data, shuffled and split between a train and test sets 
(X_train, y_train), (X_test, y_test) = mnist.load_data()

if you Keras is not using TensorFlow as backend set “KERAS_BACKEND=tensorflow” use this command

Importing libraries

Plot function

# https://gist.github.com/greydanus/f6eee59eaf1d90fcb3b534a25362cea4
# https://stackoverflow.com/a/14434334
# this function is used to update the plots for each epoch and error

Reshaping input size:

If you observe the input shape its a 2-dimensional vector. For each image, we have a (28*28) vector. We will convert the (28*28) vector into a single-dimensional vector of 1 * 784.

After converting the input images from 3d to 2d vectors.

An example of data point pixel value near to 255 is black and near to 0 is white. In the middle is gray. print(X_train[0])

Normalization:

If we observe the above matrix each cell is having a value between 0–255. before we move to apply machine learning algorithms lets try to normalize the data. X => (X — Xmin)/(Xmax-Xmin) = X/255

Labeling:

Lets convert this into a 10 dimensional vector. ex: consider an image is 5 convert it into 5 => [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]. this conversion needed for MLPs.

Step by step building a Softmax classifier

# https://keras.io/getting-started/sequential-model-guide/

The Sequential model is a linear stack of layers. you can create a Sequential model by passing a list of layer instances to the constructor:

  model = Sequential([Dense(32, input_shape=(784,)),
  Activation(‘relu’),Dense(10),Activation(‘softmax’)])# You can also simply add layers via the .add() method:  model = Sequential()
  model.add(Dense(32, input_dim=784))
  model.add(Activation(‘relu’))

# https://keras.io/layers/core/ #parameter shown in Keras layer

keras.layers.Dense(units, activation=None, use_bias=True, kernel_initializer=’glorot_uniform’, bias_initializer=’zeros’, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)

Dense implements the operation: output = activation(dot(input, kernel)+ bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and # bias is a bias vector created by the layer (only applicable if use_bias is True)

output = activation(dot(input, kernel) + bias) => y = activation(WT. X + b)

Activation:

Activations can either be used through an Activation layer or through the activation argument supported by all forward layers.

Activations - Keras Documentation

Activations can either be used through an Activation layer or through the activation argument supported by all forward…

keras.io

# from keras.layers import Activation, Dense# model.add(Dense(64))
# model.add(Activation(‘tanh’))# This is equivalent to:
# model.add(Dense(64, activation=’tanh’))

There are many activation functions ar available ex: tanh, relu, softmax

Building model:

Step 1:

The model needs to know what input shape it should expect. For this reason, the first layer in a Sequential model (and only the first, because the following layers can do automatic shape inference). it needs to receive information about its input shape. you can use input_shape and input_dim to pass the shape of input. output_dim represents the number of nodes needs in that layer, here we have 10 nodes.

Step 2:

Before training a model, you need to configure the learning process, which is done via the compile method. It receives three arguments:
1 — An optimizer. This could be the string identifier of an existing optimizer , https://keras.io/optimizers/
2 — A loss function. This is the objective that the model will try to minimize., https://keras.io/losses/
3 — A list of metrics. For any classification problem you will want to set this to metrics=[‘accuracy’]. https://keras.io/metrics/

Note: when using the categorical_crossentropy loss, your targets should be in a categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros except. for a 1 at the index corresponding to the class of the sample). that is why we converted out labels into vectors.

Step 3: Keras models are trained on Numpy arrays of input data and labels.
For training a model, you will typically use the fit function.

fit(self, x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)

fit() function Trains the model for a fixed number of epochs (iterations on a dataset).

It returns A History object. It's History. History attribute is a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values (if applicable). https://github.com/openai/baselines/issues/20

Evaluate:

print(history.history.keys())
dict_keys([‘val_loss’, ‘val_acc’, ‘loss’, ‘acc’])
history = model_drop.fit(X_train, Y_train, batch_size=batch_size, epochs=nb_epoch, verbose=1, validation_data=(X_test, Y_test))

We will get val_loss and val_acc only when you pass the parameter validation_data.

val_loss : validation loss
val_acc : validation accuracy
loss : training loss
acc : train accuracy

Building MLP + ReLu activation + Dropout + BatchNormalization + AdamOptimizer

Where do I call the BatchNormalization function in Keras?

If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning? I read…

stackoverflow.com

Where do I call the BatchNormalization function in Keras?

If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning? I read…

stackoverflow.com

Cross-entropy loss value of train and test set decreases sharply in first 2 epoch than become constant.

Plotting weights

Median weight of layer_1 is zero with less variation among nodes whereas variation of layer_2 is high of nodes. in the output layer, weight value has a negative value with less spread.

================code=================

https://github.com/ranasingh-gkp/Applied_AI_O/blob/master/Module%208_NN%2C%20Computer%20vision%2C%20Deep%20learning/Keras_Hyperparameter_Mnist.ipynb

======================================

References:

wiki
applied ai
keras.io

Multi-layer Perceptron using Keras on MNIST dataset for Digit Classification

ReLu activation + Dropout + BatchNormalization + AdamOptimizer

Loading MNIST dataset

Importing libraries

Plot function

Reshaping input size:

Normalization:

Labeling:

Step by step building a Softmax classifier

Activation:

Activations - Keras Documentation

Activations can either be used through an Activation layer or through the activation argument supported by all forward…

Building model:

Evaluate:

Building MLP + ReLu activation + Dropout + BatchNormalization + AdamOptimizer

Where do I call the BatchNormalization function in Keras?

If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning? I read…

Where do I call the BatchNormalization function in Keras?

If I want to use the BatchNormalization function in Keras, then do I need to call it once only at the beginning? I read…

Plotting weights

References:

Written by Rana singh