Do More With Less Data! — One-shot Learning with Siamese Neural Networks

Published in

SFU Professional Computer Science

8 min readFeb 4, 2020

Authored by: Keerthi Ningegowda, Kenny Chakola, Varnnitha Venugopal, Ria Thomas & Vipin Das

This blog is written and maintained by students in the Professional Master’s Program in the School of Computing Science at Simon Fraser University as part of their course credit. To learn more about this unique program, please visit {sfu.ca/computing/pmp}.

Let’s start with a news article that might scare you a bit. Don’t worry! We will show you the good news later!

Source: cbc.ca

Not less than four months ago, a student at Emily Carr University of Art and Design lost $600 from his bank account through fraudulent cheques. In this case, the most surprising factor was that the money got deducted through half a dozen cheques worth $100 each, and none of them had matching signatures. Of course, we should blame the bank staff for authorizing payments without validating the signatures accurately. But if you are a bank employee who has to deal with hundreds of cheque clearances per day, it could be a tedious task to perform due diligence with greater precision.

What can be a solution to this problem?

Wouldn’t it be great if the buzzwords ‘AI’ and ‘Deep Learning’ of the 21st century can solve this problem with minimal human intervention?

We have good news for you! One-shot Learning with Siamese Neural Networks can help solve this problem!

What are Siamese Neural Networks?

Siamese neural network is an artificial neural network that employs a distinct way to triage inputs depending on their similarity. Before diving into the architecture, take a moment to think about the kind of network that would be suitable in differentiating two images. If only we had two convolutional networks with predefined parameters that measure the difference in them to determine how similar they are (Oh, the irony!). Guess what? That is precisely how a Siamese network works!

The symmetric convolutional architecture accepts a variety of inputs, which are not just images but numerical data, sequential data such as sentences or time signals. The weights are computed using the same energy function representing distinct features of each input.

But this is a neural network! In general, a neural network demands a large training set of images to build a useful model. So if we take the case of signatures, these networks require many copies of signatures per person, which is impossible. How do we deal with that?

We deal that hurdle by applying the concept of ‘One-shot Learning’ where we need to train only on a single or few labelled images of every possible class. This method possesses a huge advantage over traditional machine learning methods that demand hundreds or thousands of samples.

How does the workflow of this algorithm look like?

Let’s now dive deep into the model. The below figure shows the working principle of a typical Siamese network:

How is the flowchart applicable to our signature verification problem?

Pre-processing: A typically labelled dataset that we use to train a siamese neural network shall look like below:

ANCHOR — Original signature
POSITIVE — Same original signature
NEGATIVE — A signature that does not match

Images of different signatures could be of different sizes. All the input images are transformed into a fixed size using bilinear interpolation. Grey scaling is applied as well to these images in-order to adjust the brightness on a scale of black to white.

Convolutional Network: Each convolutional neural network consists of an 11x11 convolution layer, a 5x5 convolution layer and two 3x3 convolution layers to get multiple feature maps where each of such map encompasses a particular feature of the image. The activation function used is ReLU. It generates unbounded activation values from the network units, which can be handled using Local Response Normalization. Performing normalization boosts the sensitivity of an excited neuron as compared to its neighborhood and dampens an area of neurons if the activation is generally high for that neighborhood. Three 2x2 max pool layers are used to reduce the size of feature maps without comprising the distinctive features in them. Dropouts of rate 0.3 and 0.5 are also used to minimize over-fitting by dropping out random units in the hidden layers. The final layer is a fully connected layer consolidated with all the features from the input image linearly.

Source: SigNet: Convolutional Siamese Network for Writer Independent Offline Signature Verification

Loss Calculation —

Loss functions are necessary to adjust the weights of this network during training and is done in two ways:

Pairwise Contrastive Loss — This technique of loss calculation uses the contrast between two pairs to estimate the loss in each training point. The training points will be of the form (anchor, positive) or (anchor, negative). The loss function computes the Euclidean distance(D) between two training images(s1,s2) with learned weights(w1,w2)

Euclidean Distance

Contrastive Loss

You may note that y is a label present in the data set. If y = 0, it implies that (s1,s2) belong to same classes. So, the loss contributed by such similar pairs will be D²(s1,s2). If y =1, it implies that (s1,s2) belong to different classes. So, the loss contributed by such dissimilar pairs will be max(0, α-D)².

One of the possible complications that can happen during training is that the neural network may try to encode all the images similarly, which leads to trivial solutions (loss will always be zero). So, a hyper-parameter ‘α’ called as margin is added to bring a difference between positive (s1,s2 = positive) and negative (s1,s2 = negative) pairs. The value of this hyper-parameter is usually set in the range of 1.0 to 2.0.

Hence, for M training samples, the total loss will be:

2. Triplet Loss — If you are using triplet loss, you need to ensure that the data should be pre-processed to have the structure of (anchor, positive, negative). The loss for a single training point is given by:

Triplet Loss

where ‘α’ is the margin, s1 stands for anchor 1, s2 stands for positive signature and s₃ stands for the negative/non-matching signature. The weights are updated using a standard backpropagation algorithm with gradient descent.

Similarity Check — After training the model, a threshold ‘d’ is used to decide whether an image belongs to the same class. The choice of threshold is dependent on the dissimilarity index that is obtained by calculating the euclidean distance of the test set(i.e. the anchor signature and the test signature). A lower dissimilarity index means that the images are similar, whereas a higher dissimilarity index signifies that the images do not match. The choice of threshold should be a trade-off between both the cases which need to be identified on a trial and error basis.

What would be an ideal way to choose training points for this model?

We know that for positive cases, the anchor and positive image of signatures in our training set should be the same. But what should be done for negative cases? Can we provide negative signatures that are largely different from the anchor? We can use two approaches to decide on our training samples.

Random Sampling — choose input samples randomly.
Hard Sampling — choose samples where the anchor and negative images look close enough but have slight dissimilarity.

A practical model needs to have the right mix of both random samples as well as hard samples. Do you wonder why we require hard samples?

Consider an example of obvious s1,s2 positive pair:

Since the encoding of s1 and s2 will be the same, D(s1,s2) ≥ 0. The loss of such obvious training pairs would be close to 0.

Consider an example of obvious s1,s2 negative pair:

Since the encoding of s1 and s2 will be very different, D(s1,s2) will be a positive value greater than margin. So, max (0, α - D(s1,s2)) will be 0. Such obvious positive and negative pairs return zero loss and inhibit gradient descent.

Hence choose s1,s2 in such a way that s1 and s2 are almost “similar” but not the same. These situations are like comparing true signatures with fraudulent signatures, and we want our model to capture these situations.

Consider an example describing the above situation:

In this case, s1 and s2 are almost similar but not the same. Samples of these kinds are called hard samples and return a loss of α - D(s1, s2), with D(s1, s2) < α adding to total loss and better learning.

So we have learnt how a Signature Verification system can be developed using this network. What other applications can benefit from this technique?

Deep Siamese Network is used as career guidance and recruitment tool by reducing the semantic distance between similar resumes and job descriptions and separate non-matching ones from each other. But if you wonder how the model would learn a new job role that is rare, one-shot learning can be applied here.
How about a robot that identifies you based on your voice and without you introducing yourself? One-Shot Speaker Identiﬁcation for a Service Robot using a CNN-based Generic Veriﬁer is an exciting topic that implements this idea using one-shot learning.
Multi-person tracking from a crowded environment is a significant challenge due to missing and spurious detection. ‘Learning by Tracking’ adapts Siamese CNN, along with Gradient boosting to capture a scene from a video and track images of individuals using pedestrian’s location and form trajectories across time. The twin network is initially trained to learn the similarity of two equally sized frames and their contextual feature (records the geometry and position).

Before we end, how about we simulate signature verification using PyTorch?

The below code shall help you in setting up a Siamese Neural Network for your learning purposes.

Credits — D. Robin Reni

Disclaimer — Signature images displayed in this blog as examples are not part of any real world datasets.

References

[1]. SigNet: Convolutional Siamese Network for Writer Independent Offline Signature Verification by Sounak Deya,∗∗, Anjan Duttaa , J. Ignacio Toledoa , Suman K.Ghosha , Josep Llados´ a , Umapada Palb

[2]. Siamese Neural Networks for One-shot Image Recognition by Gregory Koch, Richard Zemel, Ruslan Salakhutdinov

[3]. https://hackernoon.com/one-shot-learning-with-siamese-networks-in-pytorch-8ddaab10340e

[4]. https://innovationincubator.com/siamese-neural-network-with-pytorch-code-example/

[5]. Marcus Liwicki, Michael Blumenstein, Elisa van den Heuvel, Charles E.H. Berger, Reinoud D. Stoel, Bryan Found, Xiaohong Chen, Muhammad Imran Malik. SigComp11: Signature Verification Competition for On- and Offline Skilled Forgeries, Proc. 11th Int. Conference on Document Analysis and Recognition, 2011