Search Like Light Speed — (1) HNSW

Chris Kuo/Dr. Dataman
Dataman in AI
Published in
27 min readOct 20, 2023

--

Image source: https://movies.disney.com/lightyear

I love “Buzz Lightyear” the space ranger in Toy Story and I love his catchphrase “To infinity and beyond!” When I search for information, I also enjoy the speed of finding the right information. Is it all about high-speed internet and sufficient bandwidth? Not quite! In fact, the algorithms for near-instantaneous search results are of paramount importance. The speed of information retrieval is an important subject in computer science. With the high-dimensional embeddings of Large Language Models (LLMs) for texts, images, or audio data, the speed of information retrieval is a priority topic in data science.

In this post, I will talk about:

  • Vector embeddings in NLP
  • K-Nearest Neighbors (KNN) cannot keep up the speed
  • Approximate Nearest Neighbor (ANN) feels like light speed
  • The start-of-art algorithms for fast search
  • Understand Hierarchical Navigable Small World Graphs (HNSW)
  • Code example: Embedding news articles
  • Code example: FAISS for the HNSW search

This article and its following series explain the state-of-art algorithms that make buzz lightyear’s dream possible. You will gain a landscape understanding for the importance of this area and its applications. You will have hands-on…

--

--