Face Verification using MTCNN, FaceNet and Siamese Network with Triplet Loss

Parth kevadiya
7 min readMay 28, 2024

--

Facial verification is a critical application of computer vision, widely used in security systems, user authentication, and more. In this blog post, we’ll build a facial verification system using MTCNN for face detection, FaceNet for embedding generation, and a Siamese network trained with triplet loss. Let’s dive into each step in detail.

Results

At Threshold=0.5

Siamese Network for Facial Verification

A Siamese network compares the similarity between images using triplets: an anchor image of a specific person, a positive image of the same person, and a negative image of a different person. It learns to minimize the distance between the anchor and positive images while maximizing the distance between the anchor and negative images. This approach ensures the system can accurately differentiate between images of the same person and images of different people.

Dataset Details

we want to create three folder Anchor , Positive and Negative.

  • Anchor (A): An image of a person.
  • Positive (P): Another image of the same person as in the anchor.
  • Negative (N): An image of a different person.

i have created dataset like this:

dataset/
person1/
anchor/
positive/
negative/
person2/
anchor/
positive/
negative/
person3/
anchor/
positive/
negative/
.
.
.

For selecting negative images, you can use the LFW (Labeled Faces in the Wild) dataset, which provides a diverse collection of images of different people.

The goal is to create triplets (A, P, N) such that:

  • A and P are images of the same person.
  • A and N are images of different people.

Loss Function Formula:

𝐿(𝑎,𝑝,𝑛)=max⁡{𝑑(𝑎𝑖,𝑝𝑖)−𝑑(𝑎𝑖,𝑛𝑖)+margin,0}​

Step 1: Setup and Pre-requisites

Before we start, ensure you have the necessary libraries installed.

import os
import cv2
import random
from mtcnn import MTCNN
from tqdm import tqdm
from keras_facenet import FaceNet
from tensorflow.keras import layers, Model
from tensorflow import keras
import tensorflow as tf
from sklearn.metrics import f1_score, precision_score, recall_score

Step 2: Face Detection with MTCNN

2.1: Load the MTCNN Model

First, we’ll load the MTCNN model for face detection.

def load_face_detection_model():
return MTCNN()

2.2: Detect and Crop Faces

def detect_and_crop_face(image_path, detector):
image = cv2.imread(image_path)
faces = detector.detect_faces(image)

if faces:
face = faces[0]
(x, y, w, h) = face['box']
(x, y) = (max(0, x), max(0, y))
(endX, endY) = (min(image.shape[1], x + w), min(image.shape[0], y + h))
face = image[y:endY, x:endX]
return face, (x, y, endX, endY)

return None, None

Step 3: Preprocess and Create Dataset

3.1: Preprocess Images

Lets, preprocess images by detecting and cropping faces, then saving them in an organized directory structure.

def create_dataset(data_dir, output_dir):
face_detection_model = load_face_detection_model()

for student_folder in os.listdir(data_dir):
student_dir = os.path.join(data_dir, student_folder)
student_output_dir = os.path.join(output_dir, "preprocess", student_folder)
os.makedirs(student_output_dir, exist_ok=True)

for folder_name in ['anchor', 'positive', 'negative']:
folder_path = os.path.join(student_output_dir, folder_name)
os.makedirs(folder_path, exist_ok=True)

for subfolder_name in ['anchor', 'positive', 'negative']:
subfolder_dir = os.path.join(student_dir, subfolder_name)

for idx, image_file in enumerate(os.listdir(subfolder_dir), start=1):
image_path = os.path.join(subfolder_dir, image_file)
face, bbox = detect_and_crop_face(image_path, face_detection_model)

if face is not None:
output_file = f"{student_folder}_{subfolder_name}_{idx}.jpg"
cv2.imwrite(os.path.join(student_output_dir, subfolder_name, output_file), face)
else:
print(f"No face detected in {image_path}. Skipping...")
data_dir = "data" #your dataset folder path
output_dir = "face_preprocess" # you want to create this folder that contain preprocess images in Anchor, Positive and Negative formate like above shown
create_dataset(data_dir, output_dir)

3.2: Create Triplets

We’ll create triplets of anchor, positive, and negative images for training the Siamese network.(it’s work on my dataset )

def create_triplets(data_dir):
anchor_paths = []
positive_paths = []
negative_paths = []

people= os.listdir(data_dir)

for person in people:
anchor_dir = os.path.join(data_dir, person, 'anchor')
positive_dir = os.path.join(data_dir, person, 'positive')
negative_dir = os.path.join(data_dir, person, 'negative')

if not (os.listdir(anchor_dir) and os.listdir(positive_dir) and os.listdir(negative_dir)):
continue

anchor_images = os.listdir(anchor_dir)
positive_images = os.listdir(positive_dir)
negative_images = os.listdir(negative_dir)

for anchor in anchor_images:
anchor_path = os.path.join(anchor_dir, anchor)
positive = random.choice(positive_images)
positive_path = os.path.join(positive_dir, positive)
negative = random.choice(negative_images)
negative_path = os.path.join(negative_dir, negative)

anchor_paths.append(anchor_path)
positive_paths.append(positive_path)
negative_paths.append(negative_path)

return anchor_paths, positive_paths, negative_paths
data_dir = "face_preprocess/preprocess" #path of your preprocess folder you have created by above code
anchor_paths, positive_paths, negative_paths = create_triplets(data_dir)

3.3: Create TensorFlow Datasets

Convert the lists of paths into TensorFlow datasets and preprocess the images.

def preprocess(file_path):
byte_img = tf.io.read_file(file_path)
img = tf.io.decode_jpeg(byte_img)
img = tf.image.resize(img, (160, 160))
img = img / 255.0
return img
anchor_ds = tf.data.Dataset.from_tensor_slices(anchor_paths)
positive_ds = tf.data.Dataset.from_tensor_slices(positive_paths)
negative_ds = tf.data.Dataset.from_tensor_slices(negative_paths)
triplets = tf.data.Dataset.zip((anchor_ds, positive_ds, negative_ds))
preprocessed_triplets = triplets.map(lambda anchor, positive, negative: (preprocess(anchor), preprocess(positive), preprocess(negative)))
train_size = int(0.7 * len(preprocessed_triplets))
val_size = int(0.15 * len(preprocessed_triplets))
test_size = len(preprocessed_triplets) - train_size - val_size
shuffled_triplets = preprocessed_triplets.shuffle(buffer_size=len(preprocessed_triplets))
train_dataset = shuffled_triplets.take(train_size)
val_test_dataset = shuffled_triplets.skip(train_size)
val_dataset = val_test_dataset.take(val_size)
test_dataset = val_test_dataset.skip(val_size)
train_dataset = train_dataset.cache().batch(16).prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
val_dataset = val_dataset.cache().batch(16).prefetch(buffer_size=tf.data.experimental.AUTOTUNE)
test_dataset = test_dataset.cache().batch(16).prefetch(buffer_size=tf.data.experimental.AUTOTUNE)

Step 4: Defining the Embedding Model

4.1: Initialize FaceNet

We initialize FaceNet and create the embedding model.

def get_embedding_model(input_shape=(160, 160, 3)):
facenet = FaceNet()
base_model = facenet.model
base_model.trainable = False

inputs = layers.Input(shape=input_shape)
embeddings = base_model(inputs)

x = layers.Dense(units=1024, activation="relu")(embeddings)
x = layers.Dropout(0.2)(x)
x = layers.BatchNormalization()(x)
x = layers.Dense(units=512, activation="relu")(x)
x = layers.Dropout(0.2)(x)
x = layers.BatchNormalization()(x)
x = layers.Dense(units=256, activation="relu")(x)
x = layers.Dropout(0.2)(x)
outputs = layers.Dense(units=128)(x)

embedding_model = Model(inputs=inputs, outputs=outputs, name="embedding_model")
return embedding_model
embedding_model = get_embedding_model()
embedding_model.summary()

Step 5: Triplet Loss Function

Define the triplet loss function for training the Siamese network.

def triplet_loss( y_pred, margin=0.2):
"""
Triplet loss function.

Arguments:
y_pred -- list containing three parts:
anchor: the embedding for the anchor image
positive: the embedding for the positive image
negative: the embedding for the negative image
margin -- margin value, controls the relative distance between positive and negative pairs

Returns:
loss -- real number, value of the loss
"""

anchor, positive, negative = y_pred[0], y_pred[1], y_pred[2]

# Compute the distance between the anchor and the positive
pos_dist = tf.reduce_sum(tf.square(anchor - positive), axis=-1)

# Compute the distance between the anchor and the negative
neg_dist = tf.reduce_sum(tf.square(anchor - negative), axis=-1)

# Compute the triplet loss
basic_loss = pos_dist - neg_dist + margin
loss = tf.maximum(basic_loss, 0.0)

# Return the mean loss
return tf.reduce_mean(loss)

Step 6: Building the Siamese Network

Now build the Siamese network using the embedding model.

def get_siamese_network(embedding_model, input_shape=(160, 160, 3)):
anchor_input = Input(name="anchor", shape=input_shape)
positive_input = Input(name="positive", shape=input_shape)
negative_input = Input(name="negative", shape=input_shape)

anchor_embedding = embedding_model(anchor_input)
positive_embedding = embedding_model(positive_input)
negative_embedding = embedding_model(negative_input)

siamese_network = Model(
inputs=[anchor_input, positive_input, negative_input],
outputs=[anchor_embedding, positive_embedding, negative_embedding]
)

return siamese_network
siamese_network = get_siamese_network(embedding_model)
siamese_network.summary()

Step 7: Training the Siamese Network

7.1: Define Training Step

We define a function to perform a single training step for the Siamese network.

opt = tf.keras.optimizers.Adam()
checkpoint_dir = './training_triplet_checkpoints'
checkpoint_prefix = os.path.join(checkpoint_dir, 'ckpt')
checkpoint = tf.train.Checkpoint(opt=opt, siamese_model=siamese_network)

def siamese_train_step(siamese_network, optimizer, data, margin=0.2):
"""
Perform a single training step for the Siamese network.

Arguments:
siamese_network -- the Siamese network model
optimizer -- the optimizer for training
data -- tuple containing anchor, positive, and negative images
margin -- margin value for triplet loss

Returns:
loss -- computed loss value
"""
anchor, positive, negative = data

with tf.GradientTape() as tape:
anchor_embedding, positive_embedding, negative_embedding = siamese_network((anchor, positive, negative))
loss = triplet_loss([anchor_embedding, positive_embedding, negative_embedding], margin=margin)

gradients = tape.gradient(loss, siamese_network.trainable_variables)
optimizer.apply_gradients(zip(gradients, siamese_network.trainable_variables))

return loss

7.2: Train the Network

Next train the Siamese network using the defined training step.

def train_siamese_network(siamese_network, optimizer, train_dataset, num_epochs, margin=0.2):
"""
Train the Siamese network model.

Arguments:
siamese_network -- the Siamese network model
optimizer -- the optimizer for training
train_dataset -- the training dataset
num_epochs -- number of epochs for training
margin -- margin value for triplet loss
"""
for epoch in range(num_epochs):
epoch_loss = 0.0
epoch_accuracy = tf.keras.metrics.BinaryAccuracy()

with tqdm(total=len(train_dataset), desc=f'Epoch {epoch+1}/{num_epochs}', unit='batch') as pbar:
for data in train_dataset:
loss = siamese_train_step(siamese_network, optimizer, data, margin)
epoch_loss += loss

anchor_embedding, positive_embedding, negative_embedding = siamese_network(data)
pos_dist = tf.norm(anchor_embedding - positive_embedding, axis=-1)
neg_dist = tf.norm(anchor_embedding - negative_embedding, axis=-1)
accuracy = tf.cast(pos_dist < neg_dist, tf.float32)

epoch_accuracy.update_state(tf.ones_like(accuracy), accuracy)

pbar.update(1)
pbar.set_postfix({'loss': loss.numpy(), 'accuracy': accuracy.numpy()})

epoch_loss /= len(train_dataset)
acc_value = epoch_accuracy.result().numpy()
print(f'Epoch {epoch+1}/{num_epochs}, Loss: {epoch_loss:.4f}, Accuracy: {acc_value:.4f}')

checkpoint.save(file_prefix=checkpoint_prefix)
epoch_accuracy.reset_state()

num_epochs = 10
margin = 0.2

train_siamese_network(siamese_network, opt, train_dataset, num_epochs, margin)

Step 8: Evaluating the Model

8.1: Calculate Metrics

We define functions to calculate performance metrics like F1-score, precision, and recall.

def calculate_metrics(y_true, y_pred):
y_true = tf.keras.backend.flatten(y_true)
y_pred = tf.keras.backend.flatten(y_pred)
f1 = f1_score(y_true.numpy(), y_pred.numpy())
precision = precision_score(y_true.numpy(), y_pred.numpy())
recall = recall_score(y_true.numpy(), y_pred.numpy())
return f1, precision, recall

8.2: Evaluate on Datasets

We evaluate the Siamese network on the validation and test datasets.

def evaluate_siamese_network(siamese_network, test_dataset):
y_true = []
y_pred = []

for data in test_dataset:
anchor_embedding, positive_embedding, negative_embedding = siamese_network(data)
pos_dist = tf.norm(anchor_embedding - positive_embedding, axis=-1)
neg_dist = tf.norm(anchor_embedding - negative_embedding, axis=-1)

y_true.extend([1 if p < n else 0 for p, n in zip(pos_dist, neg_dist)])
y_pred.extend([1 if p < n else 0 for p, n in zip(pos_dist, neg_dist)])

f1, precision, recall = calculate_metrics(tf.constant(y_true), tf.constant(y_pred))
print(f'F1-score: {f1:.4f}, Precision: {precision:.4f}, Recall: {recall:.4f}')
print("Evaluating with validation dataset:")
evaluate_siamese_network(siamese_network, val_dataset)
print("\nEvaluating with test dataset:")
evaluate_siamese_network(siamese_network, test_dataset)

Step 9: Saving the Model

Finally, we save the trained Siamese network model.

siamese_network.save('siamese_network_triplet_loss.keras')

Conclusion:

In this blog, we built a facial verification system using cutting-edge techniques like MTCNN for face detection, FaceNet for embeddings, and a Siamese network with triplet loss. By comparing facial features, our system verifies identities. We also discovered that converting images to grayscale can enhance performance, making our system more robust across different lighting conditions.

Resources:

--

--

Parth kevadiya

| ML Practitioner | Flutter Developer | DSA | Backend Developer | Freelancer |