[Deep Learning Lab] Episode-3: fer2013

Furkan Kınlı
8 min readApr 6, 2018

--

Let the “Deep Learning Lab” begin!

This is the third episode of “Deep Learning Lab” story series which contains my individual deep learning works with different cases.

I would like to work on fer2013 dataset, which was published on International Conference on Machine Learning (ICML) 5 years ago, to recognize the facial expression in the third episode.

Example images from fer2013 dataset

I eidetically hear you ask what this fer2013 is. fer2013 is an open-source dataset which is first, created for an ongoing project by Pierre-Luc Carrier and Aaron Courville, then shared publicly for a Kaggle competition, shortly before ICML 2013. This dataset consists of 35.887 grayscale, 48x48 sized face images with various emotions -7 emotions, all labeled-.

Emotion labels in the dataset:
0: -4593 images- Angry
1: -547 images- Disgust
2: -5121 images- Fear
3: -8989 images- Happy
4: -6077 images- Sad
5: -4002 images- Surprise
6: -6198 images- Neutral

During the competition, 28.709 images and 3.589 images were shared with the participants as training and public test sets respectively and the remaining 3.589 images were kept as private test set to find the winner of the comptetition. The dataset was set to accessible to everyone after completing the competition.

Let me reference to the real heroes:

Challenges in Representation Learning: A report on three machine learning contests, Ian Goodfellow et al., 2013

Universitè de Montrèal, Technical Report

LET’S GET BACK TO 2013…

LET’S GOOOOO!

In the demo part of this story, I used JetBrains PyCharm and OpenCV to capture live frames from web camera and to detect the faces and to recognize the emotions on the faces.

First and foremost: Importing the libraries

import sys, os
import pandas as pd
import numpy as np
import cv2
from sklearn.model_selection import train_test_split
from keras.models import Sequential
from keras.layers import Dense, Dropout, Activation, Flatten
from keras.layers import Conv2D, MaxPooling2D, BatchNormalization
from keras.losses import categorical_crossentropy
from keras.optimizers import Adam
from keras.regularizers import l2
from keras.callbacks import ReduceLROnPlateau, TensorBoard, EarlyStopping, ModelCheckpoint
from keras.models import load_model

Once you have created a new folder, which is called “Emotion Recognition” in your Google Drive, you need to upload “fer2013.csv” Excel file to this folder. After then, we will define file paths through the virtual machine with the following code snippet.

BASEPATH = 'drive/Emotion Recognition'
sys.path.insert(0, BASEPATH)
os.chdir(BASEPATH)
MODELPATH = './models/model.h5'

Initializing the parameters.

We will feed the convolutional neural network with the images as batch, which contains 64 images for each, in 100 epochs and eventually, the network model will output the possibilities of 7 different emotions (num_classes) can belong to the faces on the images sized with 48x48.

num_features = 64
num_labels = 7
batch_size = 64
epochs = 100
width, height = 48, 48

Let’s read our data with the help of “pandas” from the Excel file we just uploaded to Google Drive.

data = pd.read_csv('./fer2013.csv')

Let’s see what it looks like.

data.tail()
The last 5 rows of fer2013 dataset

As you realized at first glance, the images in the Excel file are stored with the corresponding pixel values on each row and preprocessing on the data is required -a little bit-. (Source for preprocessing)

  1. Converting the relevant column element into a list for each row
  2. Splitting the string by space character as a list
  3. Numpy ❤
  4. Normalizing the image
  5. Resizing the image
  6. Expanding the dimension of channel for each image
  7. Converting the labels to catergorical matrix
pixels = data['pixels'].tolist() # 1

faces = []
for pixel_sequence in pixels:
face = [int(pixel) for pixel in pixel_sequence.split(' ')] # 2
face = np.asarray(face).reshape(width, height) # 3

# There is an issue for normalizing images. Just comment out 4 and 5 lines until when I found the solution.
# face = face / 255.0 # 4
# face = cv2.resize(face.astype('uint8'), (width, height)) # 5
faces.append(face.astype('float32'))

faces = np.asarray(faces)
faces = np.expand_dims(faces, -1) # 6

emotions = pd.get_dummies(data['emotion']).as_matrix() # 7

We are now ready to split our model into training, validation and test sets -well, I am sure that-.

X_train, X_test, y_train, y_test = train_test_split(faces, emotions, test_size=0.1, random_state=42)
X_train, X_val, y_train, y_val = train_test_split(X_train, y_train, test_size=0.1, random_state=41)

What about the architecture of the model will be?

[2 x CONV (3x3)] — MAXP (2x2) — DROPOUT (0.5)
[2 x CONV (3x3)] — MAXP (2x2) — DROPOUT (0.5)
[2 x CONV (3x3)] — MAXP (2x2) — DROPOUT (0.5)
[2 x CONV (3x3)] — MAXP (2x2) — DROPOUT (0.5)
Dense (512) — DROPOUT (0.5)
Dense (256) — DROPOUT (0.5)
Dense (128) — DROPOUT (0.5)

  • In the first convolutional layer, L2 regularization (0.01) has been added.
  • In all convolutional layers except the first one, batch normalization layer has been added.
  • MAXP (2x2) and DROPOUT (0.5) layers have been added to each convolutional layers block.
  • “RELU” has been picked as activation function for all convolutional layers.
model = Sequential()

model.add(Conv2D(num_features, kernel_size=(3, 3), activation='relu', input_shape=(width, height, 1), data_format='channels_last', kernel_regularizer=l2(0.01)))
model.add(Conv2D(num_features, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Dropout(0.5))

model.add(Conv2D(2*num_features, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(2*num_features, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Dropout(0.5))

model.add(Conv2D(2*2*num_features, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(2*2*num_features, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Dropout(0.5))

model.add(Conv2D(2*2*2*num_features, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(Conv2D(2*2*2*num_features, kernel_size=(3, 3), activation='relu', padding='same'))
model.add(BatchNormalization())
model.add(MaxPooling2D(pool_size=(2, 2), strides=(2, 2)))
model.add(Dropout(0.5))

model.add(Flatten())

model.add(Dense(2*2*2*num_features, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(2*2*num_features, activation='relu'))
model.add(Dropout(0.4))
model.add(Dense(2*num_features, activation='relu'))
model.add(Dropout(0.5))

model.add(Dense(num_labels, activation='softmax'))

Let’s see the total trainable / non-trainable parameters.

model.summary()
The last part of model summary

We are now ready to compile our model. The categorical crossentropy function has been picked out as a loss function because we have more than 2 labels and already prepared the labels in the categorical matrix structure -I confess, again, copied it from the previous episodes-.

model.compile(loss=categorical_crossentropy,
optimizer=Adam(lr=0.001, beta_1=0.9, beta_2=0.999, epsilon=1e-7),
metrics=['accuracy'])

Let’s add some more features to our model.

Firstly, we help the loss function to get rid of the “plateaus” by reducing the learning rate parameter of the optimization function with a certain value (factor) if there is no improvement on the value of the loss function for the validation set after a certain epoch (patience).

lr_reducer = ReduceLROnPlateau(monitor='val_loss', factor=0.9, patience=3, verbose=1)

We record everything done during the training into the “logs” folder as log to be able to better interpret the results of our model and to visually analyze the changes in the loss function and the accuracy during the training.

For more information on TensorBoard: GO

tensorboard = TensorBoard(log_dir='./logs')

Even if we could prevent that the loss function goes to the plateaus, the value of the loss function of validation set could get stuck in a certain range while the training set’s does not (in other words, while the model continues to learn something). As long as we continue to train the model after this point, the only thing the model could do is to memorize (over-fit) the training data -I could say there is no chance of getting rid of the local minima for the loss function without a miracle-. This is something that we will not want at all.

We stop the training of the model if there is no change in the value of the loss function on the validation set for a certain epoch (patience).

early_stopper = EarlyStopping(monitor='val_loss', min_delta=0, patience=8, verbose=1, mode='auto')

Finally, we save our model during training as long as it gets a better result than the previous epoch. Thus, we will have the best possible model at the end of the training.

checkpointer = ModelCheckpoint(MODELPATH, monitor='val_loss', verbose=1, save_best_only=True)

We can start training our model. GO GO GO!!!

model.fit(np.array(X_train), np.array(y_train),
batch_size=batch_size,
epochs=epochs,
verbose=1,
validation_data=(np.array(X_test), np.array(y_test)),
shuffle=True,
callbacks=[lr_reducer, tensorboard, early_stopper, checkpointer])
Head epochs of training
Tail epochs of training

Before measuring the performance of our model on the test set, let’s see the performances of winners’ models in the relevant Kaggle competition in 2013.

RBM (Yichuan Tang) — 71.162%
UNSUPERVISED (Yingbo Zhou & Chetan Ramaiah) — 69.267%
MAXIM MILAKOV (Maxim Milakov) — 68.821%
RADU+MARIUS+CRISTI (Radu Ionescu & Marius Popescu & Cristian Grozea) — 67.484%

We , again, all hold our breath, AND…

scores = model.evaluate(np.array(X_test), np.array(y_test), batch_size=batch_size)
print("Loss: " + str(scores[0]))
print("Accuracy: " + str(scores[1]))
Loss & Accuracy

We are very close to the performances of the winners in the competition, but we cannot pass.

At this point, the most significant problem that I could not solve is that the model starts to memorize the images after a certain number of epochs during the training. I have tried different combinations of optimization function types, different number of epochs and batch sizes, different learning rate values and deeper / shallow / less dense model architectures, but the result has never been improved.

If so, I would be very pleased to hear your ideas about the solution of this problem.

The demo of predicting the facial expression of detected faces by Haar-Cascade face detection algorithm by using our trained model:

emotion_dict = {0: "Angry", 1: "Disgust", 2: "Fear", 3: "Happy", 4: "Sad", 5: "Surprise", 6: "Neutral"}

model = load_model(MODELPATH)

cap = cv2.VideoCapture(0)

while True:
ret, frame = cap.read()

gray = cv2.cvtColor(frame, cv2.COLOR_BGR2GRAY)

face_cascade = cv2.CascadeClassifier('haarcascade_frontalface_default.xml')
faces = face_cascade.detectMultiScale(gray, 1.3, 5)

for (x, y, w, h) in faces:
cv2.rectangle(frame, (x, y), (x + w, y + h), (0, 255, 0), 1)
roi_gray = gray[y:y + h, x:x + w]
cropped_img = np.expand_dims(np.expand_dims(cv2.resize(roi_gray, (48, 48)), -1), 0)
cv2.normalize(cropped_img, cropped_img, alpha=0, beta=1, norm_type=cv2.NORM_L2, dtype=cv2.CV_32F)
prediction = model.predict(cropped_img)
cv2.putText(frame, emotion_dict[int(np.argmax(prediction))], (x, y), cv2.FONT_HERSHEY_SIMPLEX, 0.8, (0, 255, 0), 1, cv2.LINE_AA)

cv2.imshow('frame', frame)
if cv2.waitKey(1) & 0xFF == ord('q'):
break

cap.release()
cv2.destroyAllWindows()

If you try the demo, of course, you may notice that the model does not work very well in some cases. However, if I ask you to say an error rate for this model after trying the demo, you will definitely -and maybe interestingly- say a much smaller error rate than 35% (our model’s).

Examples of me while trying the demo

P.S: I could not act like I disgust or… I might not want to put my disgusted face here. -Kappa-

Well, the third episode of “Deep Learning Lab” series, fer2013 ends here. Thank you for taking the time with me. For comments and suggestions, please e-mail me. You can also contact me via LinkedIn. Thank you.

fk.

--

--

Furkan Kınlı

Ph.D. of Computer Science @ Ozyegin University | Computer Vision & Machine/Deep Learning | Pool & 3-Cushion & Snooker & NBA & Eurovision