Face Recognition using Pytorch on Amazon Sagemaker (Jupyter notebook code included)

Vaibhav Malpani
Vaibhav Malpani’s Blog
6 min readMay 11, 2020

Scientists all around the world have been working on face recognition using machine learning for more than a decade now. As a human, it’s easy for us to remember the face and the name of the person. But, it’s tough for a machine to remember this. With the growth of Neural Networks (NN) and many libraries making it easy to build a NN, this problem has become a bit easier. This blog will teach you everything that you need to know about building your own Face recognition ML model using the Pytorch library.

Applications of Face Recognition:

  • Attendance system in schools, colleges.
  • Tracking in-time and out-time of office employees.
  • Security Video monitoring to detect an unidentified person in your premise

Objective:

To build our own Face recognition model using CNN. We will use the Pytorch library to help us build CNNs. To train the model we would be using Amazon Sagemaker and save the trained model on S3.

Pre-requisites:

  • Basic python programming knowledge
  • Understanding of how CNNs work (Read here)
  • An AWS account

Getting Data:

To start off we will need a face dataset to train our model for face recognition. I have shared the dataset below to use for this tutorial. You can add your own images to the dataset. Train and Test them out on how the model works.

I have uploaded images for a few celebrities to train our model. I have very few images and still, it managed to get me over 90% accuracy. To get the best results out of a CNN model, just give it more data. :)

The GitHub link also contains the jupyter notebook for running the complete code. The notebook assumes that you have already done all the installing for torch and torchvision. For more details on how to install it locally on your machine → click here

GitHub for Dataset and Jupyter Notebook

Please feel free to comment down below if you get stuck somewhere in the code. I will be more than happy to help you

Setting up SageMaker Instance:

Training CNNs requires a lot of computing power. It’s best advised to use a machine with GPU to get results comparatively quickly. For this blog, we are going to use an Amazon SageMaker Notebook instance. We are going to use the “ml.p2.xlarge” instance which has an NVIDIA Tesla K80 GPU. To use this you will need to increase your quota limit for your GPU. Read more about it here to know how to do it.

I would recommend to use a instance with GPU, to get the results quickly

Once the status of your SageMaker instance shows “inService”, Click on “Open Jupyter” to get started with your Jupyter notebook instance.

Click on the top right corner and upload your data set and the Jupyter notebook that I have shared at the end of this blog. (You will have to upload the dataset in zip format, as we cannot upload a folder to jupyter notebook. But don’t worry we have taken care of unzipping the data for you in the notebook code)

Building Models:

I will explain a few snippets from the jupyter notebook provided earlier. You can go through the notebook to understand the rest of it which is quite straightforward.

from torchvision import models
model = models.alexnet(pretrained=True)

You can see that we are using a pre-trained Model (alexnet). torchvision provides a lot of pre-trained models that have been trained on millions of images. By using this, we just edit the top layer of the model architecture by editing the out_features of the existing model.

AlexNet(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU(inplace=True)
(10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
(classifier): Sequential(
(0): Dropout(p=0.5, inplace=False)
(1): Linear(in_features=9216, out_features=4096, bias=True)
(2): ReLU(inplace=True)
(3): Dropout(p=0.5, inplace=False)
(4): Linear(in_features=4096, out_features=4096, bias=True)
(5): ReLU(inplace=True)
(6): Linear(in_features=4096, out_features=1000, bias=True)
)
)

As you can see in the above example the “out_feautres” is 1000, but in our dataset, we want to classify into 6 classes (ben_afflek, brad_pitt, elton_john, jerry_seinfeld, madonna, mindy_kaling)

So to edit the final layer, we have used the below code.

class_names = train_data.classes
model.classifier[6] = nn.Linear(num_ftrs, len(class_names))

So after this, our model architecture looks like this:

AlexNet(
(features): Sequential(
(0): Conv2d(3, 64, kernel_size=(11, 11), stride=(4, 4), padding=(2, 2))
(1): ReLU(inplace=True)
(2): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(3): Conv2d(64, 192, kernel_size=(5, 5), stride=(1, 1), padding=(2, 2))
(4): ReLU(inplace=True)
(5): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
(6): Conv2d(192, 384, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(7): ReLU(inplace=True)
(8): Conv2d(384, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(9): ReLU(inplace=True)
(10): Conv2d(256, 256, kernel_size=(3, 3), stride=(1, 1), padding=(1, 1))
(11): ReLU(inplace=True)
(12): MaxPool2d(kernel_size=3, stride=2, padding=0, dilation=1, ceil_mode=False)
)
(avgpool): AdaptiveAvgPool2d(output_size=(6, 6))
(classifier): Sequential(
(0): Dropout(p=0.5, inplace=False)
(1): Linear(in_features=9216, out_features=4096, bias=True)
(2): ReLU(inplace=True)
(3): Dropout(p=0.5, inplace=False)
(4): Linear(in_features=4096, out_features=4096, bias=True)
(5): ReLU(inplace=True)
(6): Linear(in_features=4096, out_features=6, bias=True)
)
)

After the model architecture is designed as we needed to, we need to decide on our loss function and optimizer. In my experience, the below-given loss function and optimizer work well for face recognition.

criterion   = nn.CrossEntropyLoss()
optimizer = torch.optim.SGD(model.parameters(), lr=0.001, momentum=0.9)

Now that we have our model, loss function, and optimizer, we can pass this to our “train_model” function. It also takes num_epochs as one of the parameters. To get the best output, keep it 500.

train_model function prints the accuracy of the model in the training and testing phase. At the end of the training, returns the model with the best accuracy during the testing phase.

Once you get the best model after training, you can use the below-given code to test your model. This snippet takes a folder with just images in it as input and outputs the name of the person in the images. Of course, it only understands the faces on which we had trained earlier.

import glob
from PIL import Image
from torch.autograd import Variable
from os import listdir
with torch.no_grad():
base_path = "./face_test/"
correct = 0
total = 0
array = ['ben_afflek', 'brad_pitt' ,'elton_john', 'jerry_seinfeld', 'madonna', 'mindy_kaling']
test_transform = transforms.Compose([transforms.Resize(256),transforms.CenterCrop(224),transforms.ToTensor(),transforms.Normalize(mean=mean,std=std)])
onlyfiles = [f for f in listdir(base_path)]
print(onlyfiles)
for files in onlyfiles:
images=glob.glob(base_path + files)
for image in images:
img = Image.open(image)
trans = transforms.ToPILImage()
trans1 = transforms.ToTensor()
transformed_image = test_transform(img).float()
transformed_image = Variable(transformed_image, requires_grad=True)
transformed_image = transformed_image.unsqueeze(0)
transformed_image = transformed_image.to(device)
outputs = model(transformed_image)
_, predicted = torch.max(outputs.data, 1)
predicted_index_value = predicted.cpu().numpy()[0]+array[predicted_index_value])
imshow(trans1(img),array[predicted_index_value])

You can keep iterating with training and testing, till you get a model that you feel comfortable with.

Since you have trained this on a Sagemaker notebook instance, you need to save it S3, so that you can reuse this model again.

Below given code saves the model on the Sagemaker instance and uploads it to S3.

torch.save(model, './model_final')
import boto3
file_name = './model_final'
s3 = boto3.resource('s3')
bucket='my_bucket_name' # Replace with your s3 bucket name
s3.meta.client.upload_file(file_name, bucket, 'final_pytorch_model')

To load back the saved model you can just run all the necessary imports for the torch library and use the below code to load the model. (complete code included in the Jupyter notebook)

model = torch.load('./saved_model')
model.eval()

What did we learn?

  • how to build a CNN model using Pytorch Library
  • How to spin up a SageMaker Notebook Instance to build the CNN model quickly without the need to buy expensive GPUs.
  • Saving the trained model onto S3

In the next blog, we will learn how to deploy any Machine Learning model on AWS to get the best scalability and reliability.

If you liked this post, please clap for it; follow me if you want to read more such posts!

Twitter: https://twitter.com/IVaibhavMalpani
LinkedIn: https://www.linkedin.com/in/ivaibhavmalpani/

--

--

Vaibhav Malpani
Vaibhav Malpani’s Blog

Google Developer Expert for Google Cloud. Python Developer. Cloud Evangelist.