Search Like Light Speed — (2) LSH

Chris Kuo/Dr. Dataman
Dataman in AI
Published in
16 min readOct 21, 2023

--

In “Search like light speed — (1) HNSW,” we have learned Approximate Nearest Neighbors (ANN) and the graph-based Hierarchy Navigable Small World (HNSW) algorithm. In this post, we will learn the hashing-based Local-Sensitivity Hashing (LSH) algorithm. If you have not heard about hashing — do not worry. I will start with the basics to explain what hashing is and its advantages. I then explain why LSH works in high-dimensional spaces. This post covers the following topics:

  • What is hashing?
  • The advantages of hashing
  • Why is hashing useful for similarity search?
  • Hashing for similarity search is proven in low-dimensional spaces
  • Hashing for similarity search is challenging in high-dimensional spaces
  • Why does LSH work in high-dimensional spaces?
  • Code examples

After reading this post, you will gain a good understanding of hashing algorithms for similarity search, and you will perform coding on real data. Let’s start!

What is Hashing?

Hashing is the process of converting data into a fixed-length string of letters and numbers. Hashing is a kind of data labeling. It converts data into fixed-length strings called hash values. The data can be another kind of data such as…

--

--