Multi-layer Perceptron using Keras on MNIST dataset for Digit Classification
ReLu activation + Dropout + BatchNormalization + AdamOptimizer
Loading MNIST dataset
Every MNIST data point has two parts: an image of a handwritten digit and a corresponding label. We’ll call the images “x” and the labels “y”. Both the training set and test set contain images and their corresponding labels; for example, the training images are mnist.train.images and the training labels are mnist.train.labels.
# the data, shuffled and split between a train and test sets
(X_train, y_train), (X_test, y_test) = mnist.load_data()
if you Keras is not using TensorFlow as backend set “KERAS_BACKEND=tensorflow” use this command
Importing libraries
Plot function
# https://gist.github.com/greydanus/f6eee59eaf1d90fcb3b534a25362cea4
# https://stackoverflow.com/a/14434334
# this function is used to update the plots for each epoch and error
Reshaping input size:
If you observe the input shape its a 2-dimensional vector. For each image, we have a (28*28) vector. We will convert the (28*28) vector into a single-dimensional vector of 1 * 784.
After converting the input images from 3d to 2d vectors.
An example of data point pixel value near to 255 is black and near to 0 is white. In the middle is gray. print(X_train[0])
Normalization:
If we observe the above matrix each cell is having a value between 0–255. before we move to apply machine learning algorithms lets try to normalize the data. X => (X — Xmin)/(Xmax-Xmin) = X/255
Labeling:
Lets convert this into a 10 dimensional vector. ex: consider an image is 5 convert it into 5 => [0, 0, 0, 0, 0, 1, 0, 0, 0, 0]. this conversion needed for MLPs.
Step by step building a Softmax classifier
# https://keras.io/getting-started/sequential-model-guide/
The Sequential model is a linear stack of layers. you can create a Sequential model by passing a list of layer instances to the constructor:
model = Sequential([Dense(32, input_shape=(784,)),
Activation(‘relu’),Dense(10),Activation(‘softmax’)])# You can also simply add layers via the .add() method: model = Sequential()
model.add(Dense(32, input_dim=784))
model.add(Activation(‘relu’))
# https://keras.io/layers/core/ #parameter shown in Keras layer
keras.layers.Dense(units, activation=None, use_bias=True, kernel_initializer=’glorot_uniform’, bias_initializer=’zeros’, kernel_regularizer=None, bias_regularizer=None, activity_regularizer=None, kernel_constraint=None, bias_constraint=None)
Dense implements the operation: output = activation(dot(input, kernel)+ bias) where activation is the element-wise activation function passed as the activation argument, kernel is a weights matrix created by the layer, and # bias is a bias vector created by the layer (only applicable if use_bias is True)
output = activation(dot(input, kernel) + bias) => y = activation(WT. X + b)
Activation:
Activations can either be used through an Activation layer or through the activation argument supported by all forward layers.
# from keras.layers import Activation, Dense# model.add(Dense(64))
# model.add(Activation(‘tanh’))# This is equivalent to:
# model.add(Dense(64, activation=’tanh’))
There are many activation functions ar available ex: tanh, relu, softmax
Building model:
Step 1:
The model needs to know what input shape it should expect. For this reason, the first layer in a Sequential model (and only the first, because the following layers can do automatic shape inference). it needs to receive information about its input shape. you can use input_shape and input_dim to pass the shape of input. output_dim represents the number of nodes needs in that layer, here we have 10 nodes.
Step 2:
Before training a model, you need to configure the learning process, which is done via the compile method. It receives three arguments:
1 — An optimizer. This could be the string identifier of an existing optimizer , https://keras.io/optimizers/
2 — A loss function. This is the objective that the model will try to minimize., https://keras.io/losses/
3 — A list of metrics. For any classification problem you will want to set this to metrics=[‘accuracy’]. https://keras.io/metrics/
Note: when using the categorical_crossentropy loss, your targets should be in a categorical format (e.g. if you have 10 classes, the target for each sample should be a 10-dimensional vector that is all-zeros except. for a 1 at the index corresponding to the class of the sample). that is why we converted out labels into vectors.
Step 3: Keras models are trained on Numpy arrays of input data and labels.
For training a model, you will typically use the fit function.
fit(self, x=None, y=None, batch_size=None, epochs=1, verbose=1, callbacks=None, validation_split=0.0, validation_data=None, shuffle=True, class_weight=None, sample_weight=None, initial_epoch=0, steps_per_epoch=None, validation_steps=None)
fit() function Trains the model for a fixed number of epochs (iterations on a dataset).
It returns A History object. It's History. History attribute is a record of training loss values and metrics values at successive epochs, as well as validation loss values and validation metrics values (if applicable). https://github.com/openai/baselines/issues/20
Evaluate:
print(history.history.keys())
dict_keys([‘val_loss’, ‘val_acc’, ‘loss’, ‘acc’])
history = model_drop.fit(X_train, Y_train, batch_size=batch_size, epochs=nb_epoch, verbose=1, validation_data=(X_test, Y_test))
We will get val_loss and val_acc only when you pass the parameter validation_data.
- val_loss : validation loss
- val_acc : validation accuracy
- loss : training loss
- acc : train accuracy
Building MLP + ReLu activation + Dropout + BatchNormalization + AdamOptimizer
Cross-entropy loss value of train and test set decreases sharply in first 2 epoch than become constant.
Plotting weights
Median weight of layer_1 is zero with less variation among nodes whereas variation of layer_2 is high of nodes. in the output layer, weight value has a negative value with less spread.
================code=================
======================================
References:
- wiki
- applied ai
- keras.io