Analytics Vidhya
Published in

Analytics Vidhya

Facial Expression Classification Using Deep Learning & AWS

full stack and deep learning in AWS

Photo by Photos Hobby on Unsplash

Introduction

In this Project I trained an algorithm to classify facial expressions. The algorithm used is AWS SageMaker image classification algorithm which is a supervised learning algorithm. It takes an image as input and outputs one or more labels assigned to that image. It uses a convolutional neural network (with ResNet architecture) that can be trained from scratch or trained using transfer learning. After the model is created from training, I made it externally available by creating a REST API.

This project was done for learning and practice after finishing a 4-week 8-session course, “deep learning & full stack in AWS”, represented by AICamp.

S3, SageMaker, Lambda, API Gateway are the tools from AWS that were used in this project. Learn more at these links:

Project Task and Data Set

The data set used is from Kaggle and prepared by Pierre-Luc Carrier and Aaron Courville. Here is the link:

I downloaded “train.csv” file and used it as my data set of which about 20% will be used for validation and about 80% for training (randomly).The explanation from the link:

The data consists of 48x48 pixel grayscale images of faces.

The faces have been automatically registered so that the face is centered and occupies about the same amount of space in each image.

“train.csv” file contains two columns, “emotion” and “pixels”.

The “emotion” column contains a numeric code ranging from 0 to 6, inclusive, for the emotion that is present in the image.

The “pixels” column contains a string surrounded in quotes for each image. The contents of this string are space-separated pixel values in row major order.

The task is to categorize emotion based on the facial expression in one of seven categories (0=Angry, 1=Disgust, 2=Fear, 3=Happy, 4=Sad, 5=Surprise, 6=Neutral). The distribution of the data set is:

Angry: 3995, Disgust: 436, Fear: 4097, Happy: 7215, Sad: 4965

Surprise: 4830, Neutral: 3171

Data Preparation

First let’s see the data set using Pandas (python library for data analysis):

You can observe the type and shape of dataFrame variable in the picture. Using iterrows() Iterate over dataFrame rows as (Index, Series) pairs. Let’s get the first row’s data:

With read_csv() function you can import tabular data from CSV files into pandas DataFrame.

I used transfer learning for training, so I put the value to 1 for the option “use pretrained model” when configuring the hyperparameters of AWS image classification algorithm. In this transfer learning approach, a network is initialized with weights (in this case, trained on ImageNet), which can be later fine-tuned for an image classification task in a different dataset.

ImageNet is a large dataset that has more than 11 million images with about 11,000 categories. Once a network is trained with ImageNet data, it can then be used to generalize with other datasets as well, by simple re-adjustment or fine-tuning.

The pretrained model only accepts 3x224x224 images as input (number of channels=3, width=height=224) or in other words RGB 224x224 images. I transformed the images into three-channel grayscale images.

In my file system, I created l a folder named “FacialExpressions” with seven subfolders named “angry”, “disgust”, “fear”, “happy”, “sad”, “surprise” and “neutral”.

I imported Image module from PIL(Python Image Library). you can click here. I iterated over each row of dataFrame variable and converted each pixel string into a 48x48 grey scale image and then saved it into its category subfolder. The code might seem difficult at first, but it isn’t.

Now we have our data set as 48x48 grey scale images but as mentioned before we should convert them to 3x244x224 images. I didn’t convert them directly because this way I have them also as their original format and can use them in another situation.

I save the new images in folder ‘FacialExpressions_224BY224_3Channel’ with seven subfolders name again after our seven labels.

Inside the second for loop, img variable is a 48 by 48 matrix. After that I am adding padding as much as 88 zeros to the left, right, top and right side of it. img_224 is a 224 by 224 matrix and the img’s values are at the middle of it. Then we create a 3x224x224 matrix by repeating the img_224 matrix in the direction of depth (third axis or dimension) so that three identical matrices are stacked in front of each other. img_224_3 is a three-dimensional array. then create an image from img_224_3 and save it in its corresponded subfolder. The final image is an RGB 224x224 image but the RGB values in each pixel are the same due to the way we created img_224_3, so the color does not change.

Meta Data Creation

AWS SageMaker uses .lst files to look for the path of the images and this way knows what label is associated to each image file. Here is the explanation from AWS:

“A .lst file is a tab-separated file with three columns that contains a list of image files. The first column specifies the image index, the second column specifies the class label index for the image, and the third column specifies the relative path of the image file. The image index in the first column must be unique across all the images. The set of class label indices are numbered successively, and the numbering should start with 0. For example, 0 for the cat class, 1 for the dog class, and so on for additional classes.”

You can learn more about this topic and input data at this link:

20% of the data set is randomly chosen for validation. We store the items to train and validation lists. The order of items in each list is such that all the angry images come first and then all the disgust images and so on but this way training accuracy graph would go up and down because the algorithm processes all angry images first and adjusts its weights so the accuracy goes up but then gets to disgust images and accuracy goes down and so on. To avoid this problem, we shuffle the train and validation lists before writing the items into the f_train and f_validation.

We can open .lst files in notepad. Here are some lines:

validation
train

Saving to S3

I uploaded data set images and .lst files into my S3 bucket and created a folder for the model to be saved when it is created out of training job.

inside a s3 bucket
files
files

Since .lst files and other folders of images are in the same directory so now the relative path we used makes sense. In training job configuration, we specify the parent folder holding our data set so when creating the above two files we just mentioned the relative path of images compared to these two files.

Training Job and Hyperparameters

Among algorithms image classification was chosen. For GPU instance, (ml.p2.xlarge) with 40 GB additional storage per instance was chosen.

Learn more about hyperparameters in the AWS documentation.

I set the following hyperparameters and left the rest untouched.

· epochs: 30

· image_shape: 3,224,224

· learning_rate: 0.0001

· mini_batch_size: 32

· num_classes: 7

· num_layers: 101

· num_training _samples: 22739

· optimizer: sgd (stochastic gradient descent)

· use_pretrained_model: 1 (which means true)

Due to large size of data set and putting epochs to 30, I had to stop the training job to avoid more charges, so it stopped after 17th epoch.

Regarding choosing the best number of layers, you can start from some number and then increase it until the training accuracy does not increase anymore but anyway the number better be something from the suggested list in the documentation.

Regarding choosing the best learning_rate, you can use lr_scheduler_factor and lr_scheduler_step hyperparameters to set multiple values and then check the training accuracy graph to see which one has had the best result and choose that for the final training job.

If you don’t know about epochs and mini_batch_size, check the following link:

Besides Learning rate and mini_batch_size, the other hyperparameters for SGD are:

momentum (left as default 0.9), weight_decay (left as default 0.0001)

Metrics

4 metrics were provided for this kind of training job:

· train:accuracy

· validation:accuracy

· train:accuracy:epoch

· validation:accuracy:epoch

Train and Validation accuracy reached about 0.96 and 0.55 respectively:

In the ‘view logs’ you can check the logs and can check accuracy of batches of each epoch and the “train-accuracy” and “validation-accuracy” of each epoch.

Since I stopped the training job, I checked for the 17th epoch (epoch [16]). “train-accuracy” was 0.96 and “validation-accuracy” was 0.55.

Since the training job was stopped the option for creating model was not active so I created another job.

Some Notes About Accuracy

· Since I had to stop the training job the validation accuracy is very low, but I think if we try setting the learning rate to 0.00001 or the number of layers to 152, we will get better validation accuracy.

· For your experiments set the number of epochs low like 5 and after you figured out the best hyperparameters then change it to some fixed value for your final training job.

· I think the important reason for getting low validation accuracy is that we transformed our one channel grey scale images into three channel grey scale images. Though this worked but it is not optimal since we are extrapolating the model to a new feature space (ImageNet contains no grayscale images).

· Another important note is that we enlarged the images by adding padding of zeros, but it effects performance and optimization process since the ImageNet pretrained model was trained with 224x224 images (without padding).

Second Training Job

After doing the following changes I launched another training job:

· Creating another .lst file addressing 10602 samples

· Creating a new folder for the output model

· Reducing the size of training set to 10602 (about half of the first size by doing first step)

· For happy the size was reduced to 0.66 of the first size

· Size of the disgust was not changed since was already small

· Increasing the number of layers to 152

· Reducing the number of epochs to 5

Index for epoch starts from 0. I checked the logs and the train accuracy of 5th epoch was 0.61. The train accuracy of 5th epoch of previous training job was 0.67, so is this decrease in the second job due to increasing the number of layers or decreasing the number of samples?

Third Training Job

This time I set the number of layers to 101, the number of training samples to 10602 and the number of epochs to 15.

Accuracy of training was 0.96 and validation was 0.51.

With 30 epochs I think validation accuracy could get to at least 0.60.

The train and validation accuracy for 15th epoch of the first job was 0.95 and 0.54 so I concluded that decreasing the number of training samples to 10602 did not affect the accuracies much and this fact helps answering the question asked after second job. That decrease in the second job was most probably due to increasing the number of layers.

In the training job information section, there is a create model option, after creating a model I can access it at: SageMaker->Inference->Models

Creating A SageMaker Endpoint

To access and use the model which is saved in s3 in output-2 folder, we should create a SageMaker endpoint which is accessible inside AWS. You can learn more about creating endpoint at:

Lambda

Learn about AWS Lambda at

We create a lambda function which is where we code our business logic. The language I used is python. When lambda receives a post request (the image of face), it sends it to the endpoint to be processed by model and the endpoint sends back a result as an array (list) of 7 probabilities. The value of index 0 is for label 0 and so on. In lambda our code (program) chooses the index of the maximum of the list and then using a dictionary finds out the emotion of that index and finally the emotion is sent to API Gateway and from there to client.

I created an environment variable with key ‘ENDPOINT_NAME’ and value ‘sepehr-facial-expressions-recognition’ which is the name of my SagMaker endpoint.

In the above code, we get our image in base64 format, decode it, covert it into array of bytes with bytearray() and then send it to our endpoint to get an response.

Prediction would be in form of an array(list) of size 7 where value at index 0 is the probability for label 0 (‘angry’) and so on.

Creating Test Event

I converted the following angry image into 3x224x224 image for testing.

angry
Like this we can convert it to base64 format

In the “configure test events” I created a test event with content in JSON format like { “body” : ”…” }and the string is base64 format of input image.

Response

But we want a string as result for body so made the following changes to the code to find the index of maximum value(probability) in the list and return the emotion of that number.

REST API

to be able to connect to lambda we should create an API. We use Amazon API Gateway service for this to create a REST API. Here is the link to learn more:

This picture from AWS documentation is useful for better understanding:

Also, if you want to learn about the word API Gateway in general check this useful video:

Sending Post Request with Postman

Here is the response after we send a post request to the URL for our REST API:

My Thoughts

. Angry and surprised faces can be confusing for the algorithm since angriness can be expressed as rising eyebrows sometimes.

. Distinguishing Sadness from disgust can be tricky.

· Categorizing neutral vs sad is very difficult for the algorithm since sad faces can have similar features to neutral faces and posture of body is especially useful to reduce this problem.

· After each training job check the metrics and logs to see graphs and how accuracy changes from one epoch to another.

· Shuffling data set is very important based on my experiments. Since SageMaker uses (.lst) file to find images I shuffled the lines of that by shuffling those lists before writing to files.

· I should remind that the images in the data set only contain face.

· Humans consider others factors too rather than just facial expression to categorize emotions. (weather, news, situation, posture)

· In this project the posture of body was not considered which is very important.

· Better results could be achieved if our original images were RGB 224x224 and we did not need to transform them.

· It would be systematically better to experiment various hyperparameters like this:

  1. 1000 training samples for each label (some labels might need more samples)
  2. 500 validation samples for each label
  3. Fix the number of epochs a number from the range [5,10] (I prefer 6)
  4. Now try learning rate of 0.0001 and 0.00001 with 50, 101 and 152 layers
  5. After finding the suitable learning rate and number of layers, do the final training job with 30 epochs or more and maybe larger data set.

My LinkedIn URL

www.linkedin.com/in/sepehr-vafaei-839ab3b0

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Sepehr Vafaei

I write mentally and financially helpful content for a peaceful productive life. Trader, Blogger, Investor. BC, Canada. Subscribe to my newsletter.