Machine Learning Fundamentals: Cosine Similarity and Cosine Distance

Sindhu Seelam
Geek Culture
Published in
2 min readMay 25, 2021

Cosine similarity is a metric that measures the cosine of the angle between two vectors projected in a multi-dimensional space.

The smaller the angle between the two vectors, the more similar they are to each other.

Suppose the angle between the two vectors is 90 degrees, the cosine similarity will have a value of 0; this means that the two vectors are perpendicular to each other which means they have no correlation between them.

As the cosine similarity measurement gets closer to 1, then the angle between the two vectors A and B becomes smaller. In this case, A and B are more similar to each other.

Source: pyimagesearch

The cosine similarity is described mathematically as the division between the dot product of vectors and the product of the euclidean norms or magnitude of each vector.

Where, a and b are vectors in a multidimensional space.

Since the 𝑐𝑜𝑠(𝜃) value is in the range [−1,1] :

  • −1 value will indicate strongly opposite vectors i.e. no similarity
  • 0 indicates independent (or orthogonal) vectors
  • 1 indicates a high similarity between the vectors

Cosine Distance:

Usually, people use the cosine similarity as a similarity metric between vectors. Now, the cosine distance can be defined as follows:

Cosine Distance = 1 — Cosine Similarity

The intuition behind this is that if 2 vectors are perfectly the same then the similarity is 1 (angle=0 hence 𝑐𝑜𝑠(𝜃)=1) and thus, distance is 0 (1–1=0).

Applications of cosine similarity:

  1. The metric is used in processes of data mining, information retrieval, and text matching
  2. Used in a recommendation engine to recommend similar products/movies/shows/books.
  3. In Information retrieval, using weighted TF-IDF and cosine similarity is a very common technique to quickly retrieve documents similar to a search query.
  4. The cosine-similarity based locality-sensitive hashing technique increases the speed for matching DNA sequence data.

--

--

Sindhu Seelam
Geek Culture

Transitioning ML/AI Engineer. I’m passionate about learning & writing about my journey into the AI world. https://www.linkedin.com/in/sindhuseelam/