Complete guide to Similarity-based learning for Counterfeit detection Part -1

(Introduction to Similarity Learning with Siamese Networks)

Yash Vardhan Singh
9 min readJun 7, 2022

Harsh Parmar Khushboo Tak Trapti Kalra Yash Vardhan Singh
(IBM Consulting- AI and Advanced Analytics, CIC India)

Sources: top-left top-right bottom-left bottom-right

This article is part of a series of articles that provides an understanding and implementation of similarity learning for counterfeit detection.
1. Introduction to Siamese network and contrastive loss
2. Training Siamese network for Counterfeit detection

3. Three types of inference time implementation

In this article, we will talk about how to solve the problem of counterfeit detection using similarity learning with the Siamese network. We briefly define the need for counterfeit detection, similarity learning, and an introduction to Siamese networks with contrastive loss.

Counterfeit Detection

Today, the world is an interconnected marketplace, and a lot of counterfeit products are available in the market offline and online. It is sometimes difficult to visually identify if a product is counterfeit or authentic. The ability to recognize, reject, and prevent forgery is of paramount importance to any organization. Using computer vision and deep learning techniques we can detect counterfeits.

Counterfeit goodsCounterfeit products are fakes or unauthorized replicas of the real product. Counterfeit products are often produced with the intent to take advantage of the superior value of the imitated product. The word counterfeit frequently describes both the forgeries of currency and documents as well as the imitations of items such as clothing, handbags, shoes, pharmaceuticals, automobile parts, unapproved aircraft parts (which have caused many accidents), watches, electronics and electronic parts, software, works of art, toys, and movies.

It becomes very important to solve the problem of counterfeit detection. The challenge is that one will never have all the counterfeits of a product available in the market for training the deep learning model for the identification.
To overcome this challenge we use a similarity-based learning technique.

Before jumping to the Siamese network and other similarity-based concepts, let us understand why is there a need for similarity-based learning when binary classification learning could be used for separating two classes — counterfeit and authentic.

Need for Similarity-based learning

The classification algorithm is a Supervised Learning technique that is used to identify the category of new observations on the basis of training data. To learn more about the same, you can go through the below link:
https://www.javatpoint.com/classification-algorithm-in-machine-learning

The classification approach can find a class to which an image belongs.

In the above example, a classification algorithm is trained on three classes ‘person’, ‘gender’, and ‘type of image’. It can predict three classes for 1st picture of ‘Kamla Harris’ as a person, female, and face, and the same goes for the second picture.

Usually, classification tasks will not easily be able to answer questions like: ‘How similar two images are?’ ,’ Is it the same person?’. To answer if it is the same person we will end up training every person as a class.

Consider a problem : ‘Recognize students by face recognition system before entering the exam hall

Let us try to solve this problem and consider the dataset defined below.
For easy understandability, we have defined a dataset which is a classroom of the weirdest students present, Joey (from F.R.I.E.N.D.S), Harry Potter, and Voldemort.
For a classification problem, you will need to train a lot of images for each class.

The classification approach will be able to detect the students whose images are present in the training set.

What if a new student enters?

When a new student comes in, you will need to re-train the model by adding images of that student as a new class.

Hence the problem or rather limitation of the classification approach, in this case, is the lack of scalability (need to re-train to learn new input) and additional cost of data gathering and retraining.
In such cases, the similarity-based learning approach helps.

Similarity learning
Instead of learning how to classify the input, similarity learning is responsible to learn how to measure the similarity of two inputs. The deep neural network is used to learn a representation of the input data.

How will Similarity-based learning help here?

Similarity-based learning implements the model that will return a similarity score. When a new student comes in, we will match it with the images in the data, the similarity score will be low and therefore he/she won’t be marked present. In this case, only a single image of the new student is needed and will be enough for future comparisons.

So far, hope you got the idea behind similarity learning.

We have used the Siamese network with contrastive loss to perform similarity-based learning to detect counterfeit products.

To demonstrate the end-to-end implementation of counterfeit detection using similarity-based learning we will showcase a model used to detect authentic v/s fake fire-resistant fabric.

We used Siamese Network to perform Similarity-based learning

Siamese Network

Siamese Network, also known as a twin neural network, is an artificial neural network that contains two identical subnetworks. These subnetworks share weights and parameters.

In the Siamese network, two images are passed in parallel through the identical twin CNN network with shared weights. We obtain embeddings of both images after passing through the same network.

Image courtesy: MLOps Roundup

An embedding is a relatively low-dimensional space into which you can translate high-dimensional vector. In computer vision, embeddings are often used as a way to translate between different contexts. We can transform the image into an embedding and then decide what to do based on that embedded context.
To know more about embeddings and how Tesla is using them for solving ‘Self driving’ problem, visit : https://mlopsroundup.substack.com/p/issue-15-ai-for-self-driving-at-tesla?s=r

The below diagram shows how we pass pair of images from a fabric dataset to a Siamese network.

Fabric dataset images: Courtesy Olivia Walton, IBM

Each image from the pair is passed through both the subnetworks that share weights and parameters. Both the subnetwork will return embeddings.
Euclidean distance between both the embeddings is calculated and sent to the next layer.

The Euclidean distance is the prototypical example of the distance in a metric space. In some applications in statistics and optimization, the square of the Euclidean distance is used instead of the distance itself.
Know more about how euclidean distance can be used : http://mathonline.wikidot.com/the-distance-between-two-vectors

Siamese network will output the similarity score between the two images. Based on the threshold we define, we can assign pairs to be similar(authentic) or dissimilar(counterfeit).

The aim is to reduce the distance between embeddings of two similar images in the embedding space and increase the distance between embeddings of dissimilar images.

We will need a loss function that will bring similar images together and will send dissimilar images away.

A few types of loss functions to perform similarity learning are:
1. Contrastive loss
2. Triplet loss
3. Quadruplet loss, etc.
Other loss functions for similarity learning: https://lilianweng.github.io/posts/2021-05-31-contrastive/

We limit our scope to ‘contrastive loss’. . .

Contrastive loss

The contrastive loss function was introduced by Yann Le Cunn. To read a depth understanding of the same visit: http://yann.lecun.com/exdb/publis/pdf/hadsell-chopra-lecun-06.pdf

Let’s get a high-level understanding of how the contrastive loss function is formed and why it works in the case of similarity learning.

Consider pair of both authentic images,
If Image A and Image B belong to the authentic set then the distance(embA, embB) should be small. In vector space reduce the embedding distances between the similar images and hence loss can be represented as,

Consider pair of authentic and counterfeit images,
If Image A and Image B belong to different classes then the distance(embA, embB) should be significant. In vector space increase the embedding distances between the dissimilar images and hence loss can be represented as,

Here, ‘m’ is the margin, and F(A) and F(B) are embeddings of the images A and B.

Putting both losses together, we get contrastive loss as follows:

when A and B are both authentic images, we will have a label y equal to 1 and when A and B are from different sets, y is equal to 0

Hope this gives you an idea of the Siamese network and loss function used to perform similarity-based learning.

Similarity learning is used in various applications today, information retrieval for learning to rank, in face verification or face identification, and in recommendation systems, etc.

In the next article, we will demonstrate the code implementation for counterfeit detection.

Check out the step by step implementation using Keras:

Bonus

This section is for the readers who are interested in learning more about the existing work done on self-supervised learning and how contrastive loss was introduced. We have tried to put them in the published sequence.

1. SimCLR (Feb 2020)

This paper presents SimCLR: a simple framework for contrastive learning of visual representations. It outperforms previous methods for self-supervised and semi-supervised learning on the ImageNet dataset.

2. MOCO V2 (March 2020)

This paper verifies the effectiveness of two of SimCLR’s design improvements by implementing them in the MoCo framework. With simple modifications to MoCo namely, using an MLP projection head and more data augmentation.

3. BYOL (June 2020)

It introduces Bootstrap Your Own Latent (BYOL), a new approach to self-supervised image representation learning. BYOL relies on two neural networks, referred to as online and target networks, that interact and learn from each other.

4. SCAN (July 2020)

It provides a method to learn to classify images without labels. This method obtains promising results on the ImageNet dataset and outperforms several semi-supervised learning methods in the low-data regime without the use of any ground-truth annotations.

5. SimSiam (Nov 2020)

This paper report surprising empirical results that simple Siamese
networks can learn meaningful representations even using none of the following:
(i) negative sample pairs
(ii) large batches
(iii) momentum encoder

6. Barlow Twins (June 2021)

A successful approach to SSL is to learn embeddings that are invariant
to distortions of the input sample. However, a recurring issue with this approach is the existence of trivial constant solutions. Most current meth-
ods avoid such solutions by careful implementation details. This paper proposes a method called BARLOW TWINS that outperforms existing solutions.

7. VICReg (Jan 2022)

VICReg (Variance-Invariance-Covariance Regularization), is a method that explicitly avoids the collapse problem with two regularization terms applied to both embeddings separately.

Thanks for being a patient reader. If this was helpful hit the like button!

--

--

Yash Vardhan Singh

Data Scientist | NLP | Computer Vision | Deep Learning | LLMs |