Advance AI : Face recognition using Siamese networks

Manish Thapliyal
Jun 9 · 8 min read
Photo by 浮萍 闪电 on Unsplash

What are Siamese networks?

A Siamese network is a special type of neural network and it is one of the simplest and most popularly used one-shot learning algorithms.

one-shot learning is a technique where we learn from only one training example per class.

So, a Siamese network is predominantly used in applications where we don’t have many data points in each class. let’s say we want to build a face recognition model for our organization and about 500 people are working in our organization.

Why use Siamese networks?

For instance, let’s say we want to build a face recognition model for our organization and about 500 people are working in our organization. If we want to build our face recognition model using a Convolutional Neural Network (CNN) from scratch, then we need many images of all of these 500 people for training the network and attaining good accuracy. But apparently, we will not have many images for all of these 500 people and so it is not feasible to build a model using a CNN or any deep learning algorithm, unless we have sufficient data points. So, in these kinds of scenarios, we can resort to a sophisticated one-shot learning algorithm such as a Siamese network, which can learn from fewer data points.

How does Siamese networks work?

But how do siamese networks work? Siamese networks basically consist of two symmetrical neural networks both sharing the same weights and architecture and both joined together at the end using some energy function, E. The objective of our siamese network is to learn whether two input values are similar or dissimilar. Let’s say we have two images, X1 and X2, and we want to learn whether the two images are similar or dissimilar.

Siamese networks are not only used for face recognition, but they are also used extensively in applications where we don’t have many data points and tasks where we need to learn similarity between two inputs. The applications of siamese networks include signature verification, similar question retrieval, object tracking, and more. We will study siamese networks in detail in the upcoming section.

Architecture of Siamese networks

As you can see in the preceding diagram, a Siamese network consists of two identical networks both sharing the same weights and architecture. Let’s say we have two inputs, X1 and X2. We feed our input X1 to Network A, that is, fw(X1), and we feed our input X2 to Network B, that is, fw(X2). As you will notice, both of these networks have the same weights, w, and they will generate embeddings for our input, X1 and X2. Then, we feed these embeddings to the energy function, E, which will give us similarity between the two inputs.

It can be expressed as follows:

The input to the siamese networks should be in pairs, (X1, X2), along with their binary label, Y ∈ (0, 1), stating whether the input pairs are a genuine pair (same) or an imposite pair (different). As you can see in the following table, we have sentences as pairs and the label implies whether the sentence pairs are genuine (1) or imposite (0):

a Siamese network learns by finding similarity between two input values using identical architecture. It is one of the most commonly used few-shot learning algorithms among tasks that involve computing similarity between two entities. It is powerful and robust and serves as a solution for a low data problem.

Face recognition using Siamese networks

We will create Siamese network by building a face recognition model. The objective of our network is to understand whether two faces are similar or dissimilar. We use the AT&T Database of Faces, which can be downloaded from here: https://www.cl.cam.ac.uk/research/dtg/attarchive/facedatabase.html.

Once you have downloaded and extracted the archive, you can see the folders s1, s2, up to s40, as shown here:

Each of these folders has 10 different images of a single person taken from various angles. For instance, let’s open folder s1. As you can see, there are 10 different images of a single person:

We open and check folder s13:

So, we will take two images randomly from the same folder and mark them as a genuine pair and we will take single images from two different folders and mark them as an imposite pair.

First, we will import the required libraries:

import re
import numpy as np
from PIL import Image

from sklearn.model_selection import train_test_split
from keras import backend as K
from keras.layers import Activation
from keras.layers import Input, Lambda, Dense, Dropout, Convolution2D, MaxPooling2D, Flatten
from keras.models import Sequential, Model
from keras.optimizers import RMSprop

Now, we define a function for reading our input image. The read_image function takes as input an image and returns a NumPy array:

def read_image(filename, byteorder='>'):

#first we read the image, as a raw file to the buffer
with open(filename, 'rb') as f:
buffer = f.read()

#using regex, we extract the header, width, height and maxval of the image
header, width, height, maxval = re.search(
b"(^P5\s(?:\s*#.*[\r\n])*"
b"(\d+)\s(?:\s*#.*[\r\n])*"
b"(\d+)\s(?:\s*#.*[\r\n])*"
b"(\d+)\s(?:\s*#.*[\r\n]\s)*)", buffer).groups()

#then we convert the image to numpy array using np.frombuffer which interprets buffer as one dimensional array
return np.frombuffer(buffer,
dtype='u1' if int(maxval) < 256 else byteorder+'u2',
count=int(width)*int(height),
offset=len(header)
).reshape((int(height), int(width)))

For an example, let’s open one image:

Image.open("data/orl_faces/s1/1.pgm")
img = read_image('data/orl_faces/s1/1.pgm')
img.shape
(112, 92)

Finally, we concatenate both x_genuine_pair and x_imposite to X and y_genuine and y_imposite to Y:

size = 2
total_sample_size = 10000


def get_data(size, total_sample_size):
#read the image
image = read_image('data/orl_faces/s' + str(1) + '/' + str(1) + '.pgm', 'rw+')
#reduce the size
image = image[::size, ::size]
#get the new size
dim1 = image.shape[0]
dim2 = image.shape[1]

count = 0

#initialize the numpy array with the shape of [total_sample, no_of_pairs, dim1, dim2]
x_geuine_pair = np.zeros([total_sample_size, 2, 1, dim1, dim2]) # 2 is for pairs
y_genuine = np.zeros([total_sample_size, 1])

for i in range(40):
for j in range(int(total_sample_size/40)):
ind1 = 0
ind2 = 0

#read images from same directory (genuine pair)
while ind1 == ind2:
ind1 = np.random.randint(10)
ind2 = np.random.randint(10)

# read the two images
img1 = read_image('data/orl_faces/s' + str(i+1) + '/' + str(ind1 + 1) + '.pgm', 'rw+')
img2 = read_image('data/orl_faces/s' + str(i+1) + '/' + str(ind2 + 1) + '.pgm', 'rw+')

#reduce the size
img1 = img1[::size, ::size]
img2 = img2[::size, ::size]

#store the images to the initialized numpy array
x_geuine_pair[count, 0, 0, :, :] = img1
x_geuine_pair[count, 1, 0, :, :] = img2

#as we are drawing images from the same directory we assign label as 1. (genuine pair)
y_genuine[count] = 1
count += 1

count = 0
x_imposite_pair = np.zeros([total_sample_size, 2, 1, dim1, dim2])
y_imposite = np.zeros([total_sample_size, 1])

for i in range(int(total_sample_size/10)):
for j in range(10):

#read images from different directory (imposite pair)
while True:
ind1 = np.random.randint(40)
ind2 = np.random.randint(40)
if ind1 != ind2:
break

img1 = read_image('data/orl_faces/s' + str(ind1+1) + '/' + str(j + 1) + '.pgm', 'rw+')
img2 = read_image('data/orl_faces/s' + str(ind2+1) + '/' + str(j + 1) + '.pgm', 'rw+')

img1 = img1[::size, ::size]
img2 = img2[::size, ::size]

x_imposite_pair[count, 0, 0, :, :] = img1
x_imposite_pair[count, 1, 0, :, :] = img2
#as we are drawing images from the different directory we assign label as 0. (imposite pair)
y_imposite[count] = 0
count += 1

#now, concatenate, genuine pairs and imposite pair to get the whole data
X = np.concatenate([x_geuine_pair, x_imposite_pair], axis=0)/255
Y = np.concatenate([y_genuine, y_imposite], axis=0)

return X, Y

Now, we generate our data and check our data size. As you can see, we have 20,000 data points and, out of these, 10,000 are genuine pairs and 10,000 are imposite pairs:

X, Y = get_data(size, total_sample_size)

X.shape
(20000, 2, 1, 56, 46)

Y.shape
(20000, 1)

Next, we split our data for training and testing with 75% training and 25% testing proportions:

x_train, x_test, y_train, y_test = train_test_split(X, Y, test_size=.25)

Now that we have successfully generated our data, we build our siamese network. First, we define the base network, which is basically a convolutional network used for feature extraction. We build two convolutional layers with ReLU activations and max pooling followed by a flat layer:

def build_base_network(input_shape):

seq = Sequential()

nb_filter = [6, 12]
kernel_size = 3


#convolutional layer 1
seq.add(Convolution2D(nb_filter[0], kernel_size, kernel_size, input_shape=input_shape,
border_mode='valid', dim_ordering='th'))
seq.add(Activation('relu'))
seq.add(MaxPooling2D(pool_size=(2, 2)))
seq.add(Dropout(.25))

#convolutional layer 2
seq.add(Convolution2D(nb_filter[1], kernel_size, kernel_size, border_mode='valid', dim_ordering='th'))
seq.add(Activation('relu'))
seq.add(MaxPooling2D(pool_size=(2, 2), dim_ordering='th'))
seq.add(Dropout(.25))

#flatten
seq.add(Flatten())
seq.add(Dense(128, activation='relu'))
seq.add(Dropout(0.1))
seq.add(Dense(50, activation='relu'))
return seq

Next, we feed the image pair to the base network, which will return the embeddings, that is, feature vectors:

input_dim = x_train.shape[2:]
img_a = Input(shape=input_dim)
img_b = Input(shape=input_dim)

base_network = build_base_network(input_dim)
feat_vecs_a = base_network(img_a)
feat_vecs_b = base_network(img_b)

feat_vecs_a and feat_vecs_b are the feature vectors of our image pair. Next, we feed these feature vectors to the energy function to compute the distance between them, and we use Euclidean distance as our energy function:

def euclidean_distance(vects):
x, y = vects
return K.sqrt(K.sum(K.square(x - y), axis=1, keepdims=True))


def eucl_dist_output_shape(shapes):
shape1, shape2 = shapes
return (shape1[0], 1)

distance = Lambda(euclidean_distance, output_shape=eucl_dist_output_shape)([feat_vecs_a, feat_vecs_b])

Now, we set the epoch length to 13, and we use the RMS prop for optimization and define our model:

epochs = 13
rms = RMSprop()

model = Model(input=[img_a, img_b], output=distance)

Next, we define our loss function as the contrastive_loss function and compile the model:

def contrastive_loss(y_true, y_pred):
margin = 1
return K.mean(y_true * K.square(y_pred) + (1 - y_true) * K.square(K.maximum(margin - y_pred, 0)))

model.compile(loss=contrastive_loss, optimizer=rms)

Now, we train our model:

img_1 = x_train[:, 0]
img_2 = x_train[:, 1]

model.fit([img_1, img_2], y_train, validation_split=.25, batch_size=128, verbose=2, nb_epoch=epochs)

Now, we make predictions with test data:

pred = model.predict([x_test[:, 0], x_test[:, 1]])

Next, we define a function for computing accuracy:

def compute_accuracy(predictions, labels):
return labels[predictions.ravel() < 0.5].mean()

Now, we the accuracy of model:

compute_accuracy(pred, y_test)

0.9779092702169625

The Startup

Medium's largest active publication, followed by +479K people. Follow to join our community.

Manish Thapliyal

Written by

Data scientist, Machine Learning Engineer, Natural Language Processing and Chatbots/Voicebots

The Startup

Medium's largest active publication, followed by +479K people. Follow to join our community.