Search Like Light Speed — (1) HNSW
I love “Buzz Lightyear” the space ranger in Toy Story and I love his catchphrase “To infinity and beyond!” When I search for information, I also enjoy the speed of finding the right information. Is it all about high-speed internet and sufficient bandwidth? Not quite! In fact, the algorithms for near-instantaneous search results are of paramount importance. The speed of information retrieval is an important subject in computer science. With the high-dimensional embeddings of Large Language Models (LLMs) for texts, images, or audio data, the speed of information retrieval is a priority topic in data science.
In this post, I will talk about:
- Vector embeddings in NLP
- K-Nearest Neighbors (KNN) cannot keep up the speed
- Approximate Nearest Neighbor (ANN) feels like light speed
- The start-of-art algorithms for fast search
- Understand Hierarchical Navigable Small World Graphs (HNSW)
- Code example: Embedding news articles
- Code example: FAISS for the HNSW search
This article and its following series explain the state-of-art algorithms that make buzz lightyear’s dream possible. You will gain a landscape understanding for the importance of this area and its applications. You will have hands-on…