Face Key Point Detection Using Deep Learning

Jerry John
Analytics Vidhya
Published in
6 min readJul 16, 2020

1. Introduction

This article is mainly about the deep learning method to find the facial key points. In this work we are basically finding 15 main points. That includes..

Eyes : left eye center, right eye center, left eye inner corner, left eye outer corner, right eye inner corner, right eye outer corner.

Eyebrows : left eyebrow inner, left eyebrow outer, right eyebrow inner, right eyebrow outer.

Nose : nose tip.

Mouth : mouth left corner, mouth right corner, mouth center top, mouth center bottom.

This can be used in many applications such as · bio metrics / face recognition, analyzing facial expressions, tracking faces in images and video and detecting facial signs for medical diagnosis.

2. Datasets

We are using Facial Keypoints Detection datasets from kaggle. There are 7094 images in training data. This datasets contain x and y coordinate of the key points (30 fields), the last filed (Image) consists of pixels as integers(0–255) separated by space. The images are 96 x 96 pixels.

Lets load the datasets..

First pip install kaggle to load the data directly. (I am using google colab for this project)

# Install Kaggle API
!pip install -q kaggle

Now login to your kaggel account and download .json file that contain the kaggle user name and the key(use this step only if you want to download the datasets directly into your colab file or just directly download and use the data)

# fill in xxxxx, see your kaggle.jsonimport os
os.environ['KAGGLE_USERNAME'] = "xxxxx"
os.environ['KAGGLE_KEY'] = "xxxxx"
# download data from Kaggle
!kaggle competitions download -c facial-keypoints-detection -p data

After this step we will get all the datasets present. Two of them are in .zip format, we need to first unzip that data.

# Unzip training and test datasets to data directory!unzip data/training.zip -d data
!unzip data/test.zip -d data

Import the necessary packages and then view the training datasets.

import keras
import tensorflow as tf
import numpy as np
import pandas as pd
import matplotlib.pyplot as plt
import matplotlib.image as mpimg
from tqdm import tqdm
%matplotlib inline
data_dir = Path('./data')
train_data = pd.read_csv(data_dir / 'training.csv')
test_csv = pd.read_csv(data_dir / 'test.csv')
Id_table_path_csv = pd.read_csv(data_dir / 'IdLookupTable.csv')
# View train data
train_data.T
training datasets

This datasets are basically a combination of two separate datasets. the first one contains 7000+ samples with 8 features that is 4 key points, the second contains 2000+ images that actually belongs to the first datasets but with 30 features and 15 key points. Because of this there are many NaN values present in the datasets. There are many ways to deal with this problem. The one we are using is by separately building two models with this two datasets and try using them while predicting this key points.

All column names

3. Data Preprocessing

Now split the datasets into two, one with 4 points(8 values) and the other with 15 points(30 values). The left_eye, right_eye, nose_tip and mouse_center_bottom_lip features are available for almost all images, rest of them are only available for 2140+ images.

feature_8 =['left_eye_center_x','left_eye_center_y','right_eye_center_x','right_eye_center_y','nose_tip_x','nose_tip_y','mouth_center_bottom_lip_x','mouth_center_bottom_lip_y', 'Image']#Create 2 different datasets.train_8_csv = train_csv[feature_8].dropna().reset_index()train_30_csv = train_csv.dropna().reset_index()

Now check the content of both of the datasets.

#7000 samples, 8 features.train_8_csv.info()
#2410 samples, 30 features.train_30_csv.info()

The data contains image pixels as strings and each element is a long string (length = 96*96 = 9216). First we have to convert the string pixel values to 2D array. Then stack all the arrays into one 3D array. The function below returns 3D numpy array of shape (96,96,1)

def str_to_array(pd_series):
data_size = len(pd_series)
X = np.zeros(shape=(data_size,96,96,1), dtype=np.float32)
for i in tqdm(range(data_size)):
img_str = pd_series[i]
img_list = img_str.split(' ')
img_array = np.array(img_list, dtype=np.float32)
img_array = img_array.reshape(96,96,1)
X[i] = img_array
return X

Now run all over image data(for both the models ) with the above function. So that we can process it further.

X_train_30 = str_to_array(train_30_csv['Image'])
labels_30 = train_30_csv.drop(['index','Image'], axis=1)
y_train_30 = labels_30.to_numpy(dtype=np.float32)
X_train_8 = str_to_array(train_8_csv['Image'])
labels_8 = train_8_csv.drop(['index','Image'], axis=1)
y_train_8 = labels_8.to_numpy(dtype=np.float32)

4. Displaying training images with their key points

Now lets see what is there in training images. Here we are writing a function to display 24 images with key points.

def plot_face_pts(img, pts):
plt.imshow(img[:,:,0], cmap='gray')
for i in range(1,31,2):
plt.plot(pts[i-1], pts[i], 'b.')
fig = plt.figure(figsize=(10, 7))
fig.subplots_adjust(left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
for i in range(24):
ax = fig.add_subplot(6, 4, i + 1, xticks=[], yticks=[])
plot_face_pts(X_train_30[i], y_train_30[i])
plt.show()
Images with its key points

5. Creating Neural Network (Deep Learning Part)

def create_model(output_n = 30): model = keras.models.Sequential([ keras.layers.InputLayer(input_shape=[96,96,1]),
keras.layers.Conv2D(filters=32, kernel_size=[5,5],padding='same', use_bias=False),
keras.layers.LeakyReLU(alpha = .1),
keras.layers.BatchNormalization(),
keras.layers.Conv2D(filters=32, kernel_size=[5,5], padding='same', use_bias=False),
keras.layers.LeakyReLU(alpha = .1),
keras.layers.BatchNormalization(),
keras.layers.MaxPool2D(pool_size=[2,2]),
keras.layers.Conv2D(filters=64, kernel_size=[3,3], padding='same', use_bias=False),
keras.layers.LeakyReLU(alpha = .1),
keras.layers.BatchNormalization(),
keras.layers.Conv2D(filters=64, kernel_size=[3,3], padding='same', use_bias=False),
keras.layers.LeakyReLU(alpha = .1),
keras.layers.BatchNormalization(),
keras.layers.MaxPool2D(pool_size=[2,2]),
keras.layers.Conv2D(filters=128, kernel_size=[3,3], padding='same', use_bias=False),
keras.layers.LeakyReLU(alpha = .1),
keras.layers.BatchNormalization(),
keras.layers.Conv2D(filters=128, kernel_size=[3,3], padding='same', use_bias=False),
keras.layers.LeakyReLU(alpha = .1),
keras.layers.BatchNormalization(),
keras.layers.MaxPool2D(pool_size=[2,2]),
keras.layers.Conv2D(filters=256, kernel_size=[3,3], padding='same', use_bias=False),
keras.layers.LeakyReLU(alpha = .1),
keras.layers.BatchNormalization(),
keras.layers.Conv2D(filters=256, kernel_size=[3,3], padding='same', use_bias=False),
keras.layers.LeakyReLU(alpha = .1),
keras.layers.BatchNormalization(),
keras.layers.MaxPool2D(pool_size=[2,2]),
keras.layers.Conv2D(filters=512, kernel_size=[3,3], padding='same', use_bias=False),
keras.layers.LeakyReLU(alpha = .1),
keras.layers.BatchNormalization(),
keras.layers.Conv2D(filters=512, kernel_size=[3,3], padding='same', use_bias=False),
keras.layers.LeakyReLU(alpha = .1),
keras.layers.BatchNormalization(),
keras.layers.Flatten(),
keras.layers.Dense(units=512, activation='relu'),
keras.layers.Dropout(.1),
keras.layers.Dense(units=output_n),
]) model.compile(optimizer = 'adam' , loss = "mean_squared_error", metrics=["mae"]) return model

Create two models for the two datasets we created and then train both the models.

model_30 = create_model(output_n=30)
model_8 = create_model(output_n=8)
#Prepare callbacksLR_callback = keras.callbacks.ReduceLROnPlateau(monitor='val_loss', patience=4, verbose=10, factor=.4, min_lr=.00001)
EarlyStop_callback = keras.callbacks.EarlyStopping(patience=15, restore_best_weights=True)
#Train the model with 30 features.history = model_30.fit(X_train_30, y_train_30, validation_split=.1, batch_size=64, epochs=100, callbacks [LR_callback,EarlyStop_callback])#Train the model with 8 features.history = model_8.fit(X_train_8, y_train_8, validation_split=.1, batch_size=64, epochs=100, callbacks=[LR_callback,EarlyStop_callback])

Now check the training loss and validation loss of the both models.

fig, ax = plt.subplots(2,1)
ax[0].plot(history.history['loss'], color='b', label="Training loss")
ax[0].plot(history.history['val_loss'], color='r', label="validation loss",axes =ax[0])
legend = ax[0].legend(loc='best', shadow=True)
ax[1].plot(history.history['mae'], color='b', label="Training mae")
ax[1].plot(history.history['val_mae'],color='r',label="Validation mae")
legend = ax[1].legend(loc='best', shadow=True)
fig, ax = plt.subplots(2,1)
ax[0].plot(history.history['loss'], color='b', label="Training loss")
ax[0].plot(history.history['val_loss'], color='r', label="validation loss",axes =ax[0])
legend = ax[0].legend(loc='best', shadow=True)
ax[1].plot(history.history['mae'], color='b', label="Training mae")
ax[1].plot(history.history['val_mae'], color='r',label="Validation mae")
legend = ax[1].legend(loc='best', shadow=True)

6. Testing Our Models

In the test datasets as we see in training datasets the image data is in string format, we need to do the same pre processing which was done for the training datasets. In test datasets the key points have to be predicted and check how accurate our models are.

#Wrap test images into 3d array.
X_test = str_to_array(test_csv['Image'])
#Pridect points for each image using 2 different model.
y_hat_30 = model_30.predict(X_test)
y_hat_8 = model_8.predict(X_test)

As the model with 4 key points (i.e y_hat_8) have more data to be trained than the other. The model of y_hat_8 will be more accurate than the y_hat_30 model. So in the y_hat_30 model replace the 8 values from the y_hat_8 model which will increase the accuracy.

feature_8_ind = [0, 1, 2, 3, 20, 21, 28, 29]
#Merge 2 prediction from y_hat_30 and y_hat_8.
for i in range(8):
print('Copy "{}" feature column from y_hat_8 y_hat_30'.format(feature_8[i]))
y_hat_30[:,feature_8_ind[i]] = y_hat_8[:,i]

After merging both the models, now we are having the final values. Let us test with our test images and see how accurately it works.

fig = plt.figure(figsize=(10, 7))
fig.subplots_adjust(
left=0, right=1, bottom=0, top=1, hspace=0.05, wspace=0.05)
for i, f in enumerate(range(10,16)):
ax = fig.add_subplot(2, 3, i + 1, xticks=[], yticks=[])
plot_face_pts(X_test[f], y_hat_30[f])
plt.show()
Model result

7. Conclusion

As a smaller version our model has done a decent job here.In future we can make the version much better by:-

a. Adding more key points.

b. Identifying key points on live data.

As NO one is perfect, if anyone find any errors or suggestion please feel free to comment below.

Email Id : jerryjohn1995@gmail.com

--

--