Classifying Fashion_Mnist dataset with Convolutional Neural Nets.

Published in

The Startup

6 min readMar 18, 2020

CNN overview — Overview of techniques involved in CNN

Congratulations!! You made it this far. I promise that i won’t bore you with fancy technical lexicons and jargons. Let’s dive into CNN.

Convolutional Neural Network, generally abbreviated as ConvNet or simply CNN, is the most ubiquitous form of neural network used to work with image data.

As the name suggests, CNN literally makes use of convolutional layers, albeit with a combination of pooling and activation layers, to reduce the size of input and extract salient features from image.

In my opinion, the fashion_mnist dataset is a great tool for beginners to work with. The dataset contains 10 target classes labeled from 0 to 10 each representing an article of clothing. Each target class has an associated 28x28 grayscale image of the article of clothing.

Let’s begin. For starters, we need to download the dataset and explore

from keras.datasets import fashion_mnist(x_train, y_train), (x_test, y_test) = fashion_mnist.load_data()print('Shape:')
print('X_Train {}'.format(x_train.shape))
print('Y_Train {}'.format(y_train.shape))
print('X_Test {}'.format(x_test.shape))
print('Y_Test {}'.format(y_test.shape))output:
Shape:
X_Train (60000, 28, 28)
Y_Train (60000,)
X_Test (10000, 28, 28)
Y_Test (10000,)

Oh yes, the data is already divided in training and test dataset. Keras wants you to waste no time in preprocessing and start making Neural Net Models right away. But wait, this is a categorization problem. We need to categorize the piece of clothing as one label out of the ten labels. This makes it important to check if the categories are balanced in both the training and test dataset.

def check_samples(sample, plot=True):
    counter = {}
    for key in sample:
        if key not in counter.keys():
            counter[key] = 1
        else:
            counter[key] = counter[key] + 1
    df_dict = {'cat':[x for x in counter.keys()], 'cnt':[y for y in            counter.values()]}
    cnt_df = pd.DataFrame(df_dict)
    if plot:
        sns.barplot(data=cnt_df, x='cat', y='cnt')
        plt.xlabel('Category')
        plt.ylabel('Count')
        plt.title('# of Obs. in each Category')
        return None
    else:
    return cnt_df

Output:

All the categories have equal number of observations. The dataset is balanced.

This part is experimental.

Working with the data, i realized that the images have no noise and are straight grayscale images. To make my model more versatile, i performed basic geometric transformations and produced a larger dataset with scaled, translated and normalized images. The dataset now is increased in size and contains original images along with it’s transformed self. To avoid creation of ‘pockets’ — rows in dataset with similar data spatially close to each other — I shuffled the dataset.

#Scale Image
def scale_img(dim, img):
    resized_img = cv2.resize(np.asarray(img), dim, interpolation = cv2.INTER_AREA)
    return resized_img#Normalize Image
def normalize_img(img, is_gray = False):
    normalized_img = cv2.normalize(img, img, 0, 255, cv2.NORM_MINMAX)
    if is_gray:
        cvt = cv2.cvtColor(normalized_img, cv2.COLOR_BGR2GRAY)
        return cvt
    return normalized_img#Rotate Image
def rotate_img(img, rot_deg):
    rows,cols = img.shape[0], img.shape[1]
    M = cv2.getRotationMatrix2D((cols/2,rows/2),rot_deg,1)
    dst = cv2.warpAffine(img,M,(cols,rows))
    return dst#Translate Image
def translate_img(img,x=100,y=50):
    rows,cols = img.shape[0], img.shape[1]
    M = np.float32([[1,0,x],[0,1,y]])
    dst = cv2.warpAffine(img,M,(cols,rows))
    return dstx_train_tran = x_train.tolist()
y_train_tran = y_train.tolist()
ctr = 0
for img in x_train:
    x_train_tran.append(rotate_img(img, 90))
    y_train_tran.append(y_train[ctr])
    x_train_tran. append(translate_img(img, x=15, y=15))
    y_train_tran.append(y_train[ctr])
    ctr = ctr + 1x_train_tran = np.array(x_train_tran).reshape(len(x_train_tran), 28, 28, 1)
y_train_tran = to_categorical(np.array(y_train_tran))x_train_tran, y_train_tran = shuffle(x_train_tran, y_train_tran)print('Shape:')
print('X_Train {}'.format(x_train_tran.shape))
print('Y_Train {}'.format(y_train_tran.shape))Output:
Shape:
X_Train (180000, 28, 28, 1)
Y_Train (180000, 10)

Here are some sample images in the dataset

Augmented Dataset with Transformed Images

Time to create CNN Model.

I played with quite a few models. Training each model takes a lot of time. This is the simplest and effective model.

cnn_model = Sequential()
cnn_model.add(Conv2D(filters=128, kernel_size=(5,5), activation='relu',input_shape=(28,28,1)))
cnn_model.add(MaxPooling2D(pool_size=(3,3), strides=(3,3)))
cnn_model.add(Conv2D(filters=64, kernel_size=(2,2), activation='relu'))
cnn_model.add(MaxPooling2D(pool_size=(2,2)))
cnn_model.add(Conv2D(filters=32, kernel_size=(2,2), activation='relu'))
cnn_model.add(MaxPooling2D(pool_size=(2,2), strides=(2,2)))
cnn_model.add(Flatten())
cnn_model.add(Dense(32, activation='relu'))
cnn_model.add(Dense(to_categorical(y_train).shape[1], activation='softmax'))

The model smartly named cnn_model, as if it was not apparent, is the most standard version of CNN. It is a sequential model. Believe me sequential models are simple and easy to play with than a functional model.

>> cnn_model.summary()Output:Model: "sequential_1"
_________________________________________________________________
Layer (type)                 Output Shape              Param #   
=================================================================
conv2d_3 (Conv2D)            (None, 24, 24, 128)       3328      
_________________________________________________________________
max_pooling2d_3 (MaxPooling2 (None, 8, 8, 128)         0         
_________________________________________________________________
conv2d_4 (Conv2D)            (None, 7, 7, 64)          32832     
_________________________________________________________________
max_pooling2d_4 (MaxPooling2 (None, 3, 3, 64)          0         
_________________________________________________________________
conv2d_5 (Conv2D)            (None, 2, 2, 32)          8224      
_________________________________________________________________
max_pooling2d_5 (MaxPooling2 (None, 1, 1, 32)          0         
_________________________________________________________________
flatten_1 (Flatten)          (None, 32)                0         
_________________________________________________________________
dense_2 (Dense)              (None, 32)                1056      
_________________________________________________________________
dense_3 (Dense)              (None, 10)                330       
=================================================================
Total params: 45,770
Trainable params: 45,770
Non-trainable params: 0
_________________________________________________________________

The model has 3 convolutional (conv2D) layers each of which separated by a max pooling layer. Each conv layer has a different kernel size and a different number of filters that helps to create a feature map. MaxPooling helps to reduce the size of the feature map making it easy to work with. There are various options in Pooling (max, min and avg). The one i used is a maxpooling layer as i need to keep the pixels with maximum value.

Each conv layer uses filters to extract the important details from the image. The filter is a matrix (generally a smaller matrix than the matrix of the original image) with values which is superimposed on the 28x28 matrix of the image and is slid across the original matrix to produce a resultant matrix called feature map which is usually the cartesian product of the filter and the original matrix and captures different features from image. For example, the filters below used by 2nd conv layer helped in detecting curve like pattern in the image as shown in the feature maps below.

Filter matrices of 2nd hidden convolutional layer

Feature Maps of 2nd hidden convolutional layer.

Running the model.

es = EarlyStopping(monitor='loss', min_delta=1e-4, patience=15, verbose=2, mode='auto')cnn_model.compile(optimizer='adam', loss='categorical_crossentropy', metrics=['accuracy'])cnn_model.fit(x_train_tran, y_train_tran, batch_size=32, epochs=150, callbacks=[es])

Classifying test data.

preds = cnn_model.predict(x_test_tran)pred_list = []
for i in preds:
    pred_list.append(np.argmax(i))
from sklearn.metrics import accuracy_scoreprint('{}%'.format(accuracy_score(y_pred=pred_list,y_true=y_test) * 100))
Output
91.1%

Visualizing Results

Finding number of observations classified for each class versus how many really exists in that class. This will help us know how is the model performing for each class. If the model is biased towards one class, it will classify more number of observations as that class.

class_map = {0:'T-shirt/top',
             1:'Trouser',
             2:'Pullover',
             3:'Dress',
             4:'Coat',
             5:'Sandal',
             6:'Shirt',
             7:'Sneaker',
             8:'Bag',
             9:'Ankle boot'}true_class_ct = {}
for i in range(10):
    true_class_ct[class_map[i]] = sum(1 for x in incorrect_df['true'] if x == class_map[i])pred_class_ct = {}
for i in range(10):
    pred_class_ct[class_map[i]] = sum(1 for x in incorrect_df['pred'] if x == class_map[i])
pred_class_ctop = {}
def percentage_error(trueval, changedval, i):
    op[class_map[i]] = ((changedval - trueval)/trueval) * 100for ct in range(10):
    tv = true_class_ct[class_map[ct]]
    cv = pred_class_ct[class_map[ct]]
    percentage_error(tv, cv, ct)
    
fig, ax = plt.subplots()
width = 0.4
x = np.arange(10)
rects1 = ax.bar(x - width/2, list(true_class_ct.values()), width, label='True')
rects2 = ax.bar(x + width/2, list(pred_class_ct.values()), width, label='CNN Predicted')
ax.set_ylabel('Count')
ax.set_title('Classification Error in each class')
ax.set_xticks(x)
ax.set_xticklabels(list(op.keys()))
ax.legend()
plt.xticks(rotation=45)

We can also view percentage of classification error in each class

sns.barplot(x=list(op.keys()), y=list(op.values()))
plt.ylabel("Percentage")
plt.title('Percentage error for each class')
plt.xticks(rotation=45)
plt.xlabel('Classes')

Sneakers seems to be a problem. The model classified 60% more samples as sneakers than the original samples that were sneakers. Percentage, however, by itself is not enough. If we consider both bar charts, we can see that our model is overfitting few categories. There are few ways to avoid that.

Regularization (L1/L2)
Adding Dropout Layers
Adding more data by data augmentation (which we did).

Model fitting in my opinion is an iterative process. You begin with a model that you think has best hyperparameters to understand data. You then assess the model to see how well it performs on the training data using metrics like accuracy, loss etc. If the model seems to be not doing great, you tweak the hyperparameters and fit the model again until you reach a desired model. Hypothetically, a desired model is the one that has close to 100% accuracy and, that does not over or under-fit on training set and is not biased.

Best of luck on your AI journey!

Classifying Fashion_Mnist dataset with Convolutional Neural Nets.

Time to create CNN Model.

Running the model.

Visualizing Results

Written by Harsh