Semantic Search Similarity Metrics

Jun Xie
2 min readAug 26, 2022

--

There are three common similarity metrics used in the semantic search to measure the similarity/distance between vectors, including squared L2 (l2), inner product (ip) and cosine similarity (cosine).

For L2, it is commonly named as Euclidean distance. Assume there are two vectors with n dimension:

Squared L2 distance:

In semantic search, we normally use distance. Inner product is used for similarity. This way, we get its distance by doing a subtraction.

We apply the same subtraction idea to the cosine similarity.

In fact, if x and y are normalized, then all three distances are equivalent. The context is that normalized, then inner product <x, x> = <y, y> = 1

  1. Cosine similarity and inner product are the same.
  2. For l2 and ip relationship is l2=2*ip. This way, the two distances are just proportional, which doesn’t change the search result.

Based on the above logic, given normalized, feel free to choose any distance metric. If not normalized, then it is better to run experiments to figure out which one performs better in terms of recall.

--

--

Jun Xie

Founder and ex-Snap software engineer. I am interested at Machine Learning and Database. Feel free to drop me an email: xiejuncs@gmail.com