What is Convolutional Neural Network (CNN) ? — with Keras

9 min readDec 12, 2018

CONVOLUTIONAL NEURAL NETWORK

What is Convolutional Neural Network (CNN or ConvNet) ?

Convolutional Neural Network is a deep learning algorithm which is used for recognizing images. This algorithm clusters images by similarity and perform object recognition within scenes. CNN uses unique feature of images (e.g. cat’s tail and ears, airplane’s wing and engine etc.) to identify object that is placed on the image. Actually this process is very similar with what our brain does to identify objects.
Traditional neural networks are not ideal for image processing that’s why we are using CNN. Although CNN is not too different from ANN. Because in the end CNN algorithm uses Artificial Neural Networks but before that CNN uses some layers to gather information and determine some features from the image.

Before we start you may check that:
Logistic Regression and Implementing ML Algorithms: https://medium.com/@cdabakoglu/heart-disease-logistic-regression-machine-learning-d0ebf08e55c0
Artificial Neural Network- Implementing with Keras:
https://medium.com/@cdabakoglu/artificial-neural-network-with-keras-d858f82f90c5

What are these layers or processes? How CNN works?

Convolution
Padding
Pooling
Flattening
Full Connection

Convolutional Layer (Convolutional Operation)

This process is main process for CNN. In this operation there is a feature detector or filter. This filter detects edges or specific shapes. Filter is placed top left of image and multiplied with value on same indices. After that all results are summed and this result is written to output matrix. Then filter slips to right to do this whole processes again and again. Usually filter slips one by one but it can be change according to your model and this slipping process is called ‘stride’. Bigger stride means smaller output. Sometimes stride value is increased to decrease output size and time.

Let’s try to understand process mentioned above with a visualization.

After convolutional operations we will use an activation function to break up linearity. We want to increase non-linearity otherwise algorithm can’t understand image and act like it is a linear function. In other algorithms we usually use sigmoid and tanh functions as activation functions but in Convolutional Neural Network we are using ReLU. Becuase ReLU function is better for time-efficiency.

Padding

We have to keep as much information we can in early processes of CNN. But convolutional operations we mentioned above decrease size of image that’s why we apply Padding to preserve our input size.
For example our input size is 36x36x3 but after convolutional operations we have output with size of 32x32x3 we will fix it by adding some padding like below.

Pooling

This layer is used for reducing parameters and computating process. Also by using this layer features invariant to scale or orientation changes are detected and it prevents overfitting. There are some pooling process like average pooling, max pooling etc. But mostly max pooling is used. Let’s say we have a 2x2 filter and an 4x4 input(image) results of max pooling and average pooling will be like that: (stride = 2)

Flattening

Basically flattening is taking matrix came from convolutional and pooling processes and turn it into one dimensional array. This is important because input of fully-connected layer -or let’s say Artificial Neural Networks- consist of one dimensional array.

Full Connection

This layer takes data from one dimension array we saw above and starts learning process.

Convolutional Neural Network with Keras

Dropout

Dropout is a regularization technique for reducing overfitting. It is called “dropout” because it drops out visible or hidden units in neural network.

Let’s look dataset we’ll use.

About Dataset

Now we’ll try to use this algorithm with a dataset contains images of 10 different classes of clothing. Dataset consists a training set of 60,000 examples and a test set of 10,000 examples. Each example is a 28x28 grayscale image, associated with a label from 10 classes.

Each image is 28 pixels in height and 28 pixels in width, for a total of 784 pixels in total.
Each pixel has a single pixel-value associated with it, indicating the lightness or darkness of that pixel, with higher numbers meaning darker. This pixel-value is an integer between 0 and 255.
The training and test data sets have 785 columns. The first column consists of the class labels, and represents class of clothing. The rest of the columns contain the pixel-values of the associated image.

Each training and test example is assigned to one of the following labels:

0 T-shirt/top
1 Trouser
2 Pullover
3 Dress
4 Coat
5 Sandal
6 Shirt
7 Sneaker
8 Bag
9 Ankle boot

Import Libraries and Read Data

import numpy as np
import pandas as pd
import seaborn as sns
import matplotlib.pyplot as plt# Reading Train Data
dfTrain = pd.read_csv("dataframes/fashion-mnist/fashion-mnist_train.csv")
print("Shape of Train Data: " + str(dfTrain.shape))

Shape of Train Data: (60000, 785)

# First 5 rows of train data
dfTrain.head()

# Reading Test Data
dfTest = pd.read_csv("dataframes/fashion-mnist/fashion-mnist_test.csv")
print("Shape of Test Data: " + str(dfTest.shape))

Shape of Test Data: (10000, 785)

# First 5 rows of test data
dfTest.head()

Y_train = dfTrain.label
X_train = dfTrain.drop(["label"], axis=1)
X_test = dfTest.drop(["label"], axis=1)
Y_test = dfTest.labelplt.figure(figsize=(16,5))
sns.countplot(Y_train, palette="twilight_shifted_r")
plt.title("Number of Classes")
plt.show()

Example Images from Dataset

plt.figure(figsize=(20,5))for i in range(10):
    plt.subplot(2,5,i+1)
    img = dfTrain[dfTrain.label==i].iloc[0,1:].values
    img = img.reshape((28,28))
    plt.imshow(img, cmap='gray')
    plt.title("Class: " + str(i))
    plt.axis('off')
    
plt.show()

Normalization

We’ll use normalization to reduce effect of illumination’s differences. Also it contributes to works CNN faster.

# NormalizationX_train = X_train / 255.0
X_test = X_test / 255.0

Reshape

Our images are 28x28 but to using Keras they have to be 3D matrices. That’s why we reshape them as 28x28x1, we’ll use 1 channel because our images are gray scaled. (e.g. grayscale images has only one channel, RGB image has three channels)

# ReshapeX_train = X_train.values.reshape(-1, 28, 28, 1)
X_test = X_test.values.reshape(-1, 28, 28, 1)
print("X_train Shape: ", X_train.shape)
print("X_test Shape: ", X_test.shape)

X_train Shape: (60000, 28, 28, 1)
X_test Shape: (10000, 28, 28, 1)

Label Encoding

We turn our classes into one-hot encoding label.

# Label Encodingfrom keras.utils.np_utils import to_categoricalY_train = to_categorical(Y_train, num_classes=10)

Train-Test Split

We’ll split our train data. 30% of data will be validation data and 70% of data will be train data.

from sklearn.model_selection import train_test_splitx_train, x_val, y_train, y_val = train_test_split(X_train, Y_train, test_size = 0.3, random_state = 42)print("x_train shape",x_train.shape)
print("x_test shape",x_val.shape)
print("y_train shape",y_train.shape)
print("y_test shape",y_val.shape)

x_train shape (42000, 28, 28, 1)
x_test shape (18000, 28, 28, 1)
y_train shape (42000, 10)
y_test shape (18000, 10)

Implementing Convolutional Neural Network Algorithm with Keras

Create Model

from sklearn.metrics import confusion_matrix
from keras.utils.np_utils import to_categorical
from keras.models import Sequential
from keras.layers import Dense, Dropout, Flatten, Conv2D, MaxPool2D
from keras.optimizers import RMSprop, Adam
from keras.preprocessing.image import ImageDataGenerator
from keras.callbacks import ReduceLROnPlateau
import itertoolsmodel = Sequential()model.add(Conv2D(filters = 32, kernel_size = (3,3), padding = 'Same', activation='relu', input_shape=(28,28,1)))
model.add(MaxPool2D(pool_size = (2,2)))
model.add(Dropout(0.25))model.add(Conv2D(filters = 64, kernel_size = (3,3),padding = 'Same', activation ='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.25))model.add(Conv2D(filters = 128, kernel_size = (3,3),padding = 'Same', activation ='relu'))
model.add(MaxPool2D(pool_size=(2,2)))
model.add(Dropout(0.25))model.add(Flatten())model.add(Dense(256, activation='relu'))
model.add(Dropout(0.25))
model.add(Dense(256, activation='relu'))
model.add(Dropout(0.1))
model.add(Dense(10, activation='softmax'))

Optimizer

We’ll use ‘Adam Optimizer’. Adam is an optimization algorithm that can used instead of the classical stochastic gradient descent procedure to update network weights iterative based in training data. It is different to classical stochastic gradient descent. SGD pursues a single learning rate for all weights updates and learning rate(alpha) doesn’t change during training process. However in adam optimizer we can say adam optimizer updates leraning rate dynamically.

optimizer = Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=None, decay=0.0, amsgrad=False)

Compile Model

Since we have 10 classes we’ll use categorical crossentropy.

model.compile(optimizer = optimizer, loss = 'categorical_crossentropy', metrics=['accuracy'])

Epoch and Batch Size

Epoch is the number of times the algorithm sees the entire data set. If one epoch is too big to run to the computer at once we divide it smaller parts and number of this parts is called batch.

epochs = 50
batchSize = 300

Data Augmentation

By using “data augmentation” we can create new data with different orientations. It prevents overfitting.

# Data Augmentationdatagen = ImageDataGenerator(
        featurewise_center=False,  # set input mean to 0 over the dataset
        samplewise_center=False,  # set each sample mean to 0
        featurewise_std_normalization=False,  # divide inputs by std of the dataset
        samplewise_std_normalization=False,  # divide each input by its std
        zca_whitening=False,  # dimesion reduction
        rotation_range=0.1,  # randomly rotate images in the range
        zoom_range = 0.1, # Randomly zoom image
        width_shift_range=0.1,  # randomly shift images horizontally
        height_shift_range=0.1,  # randomly shift images vertically
        horizontal_flip=False,  # randomly flip images
        vertical_flip=False)  # randomly flip imagesdatagen.fit(x_train)

Fit The Model

cnn = model.fit_generator(datagen.flow(x_train, y_train, batch_size=batchSize), epochs=epochs, validation_data=(x_val, y_val), steps_per_epoch=x_train.shape[0] // batchSize)

Evaluate The Model

print("Accuracy after fitting: {:.2f}%".format(cnn.history['acc'][-1]*100))

Accuracy after fitting: 89.92%

For better accuracy you can increase number of epochs or you can change parameters on layers or you can add additional layer to the model.For the moment I won’t do that because fitting process takes a lot of time.

plt.figure(figsize=(18,6))plt.subplot(1,2,1)
plt.plot(cnn.history['loss'], color="blue", label = "Loss")
plt.plot(cnn.history['val_loss'], color="orange", label = "Validation Loss")
plt.ylabel("Loss")
plt.xlabel("Number of Epochs")
plt.legend()plt.subplot(1,2,2)
plt.plot(cnn.history['acc'], color="green", label = "Accuracy")
plt.plot(cnn.history['val_acc'], color="red", label = "Validation Accuracy")
plt.ylabel("Accuracy")
plt.xlabel("Number of Epochs")
plt.legend()plt.show()

Let’s find out score by giving test data we imported before.

Y_test = to_categorical(Y_test, num_classes=10) # One-Hot Encodingscore = model.evaluate(X_test, Y_test)
print("Test Loss: {:.4f}".format(score[0]))
print("Test Accuracy: {:.2f}%".format(score[1]*100))

10000/10000 [==============================] — 30s 3ms/step
Test Loss: 0.2069
Test Accuracy: 92.18%

Y_pred = model.predict(X_test)
Y_pred_classes = np.argmax(Y_pred, axis = 1)
Y_true = np.argmax(Y_test, axis = 1)
confusionMatrix = confusion_matrix(Y_true, Y_pred_classes)f,ax=plt.subplots(figsize=(10,10))
sns.heatmap(confusionMatrix, annot=True, linewidths=0.1, cmap = "gist_yarg_r", linecolor="black", fmt='.0f', ax=ax)
plt.xlabel("Predicted Label")
plt.ylabel("True Label")
plt.title("Confusion Matrix")
plt.show()

for i in range(len(confusionMatrix)):
    print("Class:",str(i))
    print("Number of Wrong Prediction:", str(sum(confusionMatrix[i])-confusionMatrix[i][i]), "out of 1000")
    print("Percentage of True Prediction: {:.2f}%".format(confusionMatrix[i][i] / 10))
    print("<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<")

Class: 0
Number of Wrong Prediction: 122 out of 1000
Percentage of True Prediction: 87.80%
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Class: 1
Number of Wrong Prediction: 11 out of 1000
Percentage of True Prediction: 98.90%
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Class: 2
Number of Wrong Prediction: 185 out of 1000
Percentage of True Prediction: 81.50%
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Class: 3
Number of Wrong Prediction: 47 out of 1000
Percentage of True Prediction: 95.30%
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Class: 4
Number of Wrong Prediction: 115 out of 1000
Percentage of True Prediction: 88.50%
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Class: 5
Number of Wrong Prediction: 16 out of 1000
Percentage of True Prediction: 98.40%
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Class: 6
Number of Wrong Prediction: 198 out of 1000
Percentage of True Prediction: 80.20%
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Class: 7
Number of Wrong Prediction: 29 out of 1000
Percentage of True Prediction: 97.10%
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Class: 8
Number of Wrong Prediction: 13 out of 1000
Percentage of True Prediction: 98.70%
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<
Class: 9
Number of Wrong Prediction: 46 out of 1000
Percentage of True Prediction: 95.40%
<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<<

As you can see above we should focus on especially Class 2 and Class 6 to improve our score.

For the dataset we used: https://www.kaggle.com/zalando-research/fashionmnist

Thanks for your time!

LinkedIn: https://www.linkedin.com/in/canerdabakoglu/

GitHub: https://github.com/cdabakoglu

Kaggle: https://www.kaggle.com/cdabakoglu