Deep Metric Learning- Fundamentals

4 min readMar 25, 2023

Part-1:- Deep Metric Learning- Fundamentals
Part-2:- Deep Metric Learning- Contrastive Approaches
Part-3:- Deep Metric Learning- Supervised Approaches

Distance metric learning is a branch of machine learning that aims to learn distances from the data, which enhances the performance of similarity-based algorithms. This series will help you get started and introduce “State Of The Art” Approaches.

Metric Space & Distance Function
Need for Deep Metric Learning
Classification VS Metric Learning
Problem Setting
Applications
References

Metric Space & Distance Function

In mathematics, a metric space is a set together with a notion of distance between it’s elements, usually called points. This distance is measured by a function called metric or distance function. The function d should satisfy following properties:-

The distance from a point to itself is zero i.e. d(x,x)=0. Intuitively, it never costs anything to travel from a point to itself.
Positivity:- The distance between two distinct points is always positive: if x!=y, then d(x,y)>0
Symmetry:- The distance from x to y is always the same as the distance from y to x i.e. d(x,y)=d(y,x)
Triangular inequality should hold. i.e. d(x,z)≤d(x,y)+d(y,z). It is based on the intuitive idea that: The straight line path is the shortest path.

Need for Deep Metric Learning

Algorithms such as k-means clustering, DBSCAN, decision trees, kNN which are representative algorithms of data mining and machine learning operate based on distance functions, determining a distance function suitable for given data is critical in terms of their algorithm accuracy.

However, among predefined distance functions, a distance function suitable for all data does not exist in reality. For this reason, metric learning, which directly creates a distance function suitable for data with a machine learning algorithm is needed.

In terms of machine learning, the expression of a distance function suitable for data means a distance function that makes it easy to distinguish the data for each target value of the data. For data that was not easy to classify with existing features, the problem of creating a classification model has become very simple by learning metrics that make it possible to classify data by class label as shown in figure.

The purpose of the metric learning problem is to learn an embedding function that transforms the input data to be well distinguished according to each target value.

Classification VS Metric Learning

Let’s imagine that we are building a face recognition system. We can solve this problem in 2 ways-

Classification Problem- An N way classification task, predicting from a fixed set of possible output classes.
Verification Problem- It is a matching operation, where you match the given sample to the closest sample from a reference of N other samples.

The choice depends on whether the test data is open set (same classes come during training set) vs closed set (different classes from training set).

In classification, softmax loss is usually used for training to encourage features to be separable, which leads the inter-class be disperse. However, in Verification, the similarity is measured between the images by the Euclidean distance or the cosine distance, which requires feature representation not only to be separable but also discriminative. For Discriminative features we want inter-class to disperse with sufficient margin, and intra-class to be compact as much as possible.

Problem Setting

Metric learning problems fall into two main categories depending on the type of supervision available for the training data:

Supervised learning: the algorithm has access to a set of data points, each of them belonging to a class (label) as in a standard classification problem. The goal in this setting is to learn a distance metric that puts points with the same label close together while pushing away points with different labels.
Weakly supervised learning: the algorithm has access to a set of data points with supervision only at the tuple level (typically pairs, triplets, or quadruplets of data points). A classic example of such weaker supervision is a set of positive and negative pairs: in this case, the goal is to learn a distance metric that puts positive pairs close together and negative pairs far away.

Applications

Computer Vision problems like- face recognition, person re-identification problem, image retrieval
Information retrieval, anomaly detection, self supervised learning of representations for text, audio & vision
Multimodal Learning- like CLIP

References

I hope you found my exploration of Deep Metric Learning informative and easy to understand. Please feel free to reach out to me with any questions or concerns regarding the concepts covered here. Your feedback is valuable to me and greatly appreciated. Even a simple clap 👏🏼 would be a wonderful show of support 😇. You can connect with me on Linkedin. In our upcoming installment, we will delve into the specifics of contrastive approaches.