A Beginner’s Guide to Similarity Search & Vector Indexing (Part Two)

Kamal Dhungana
9 min readNov 1, 2023

Part 1, Part 2, and Part 3 of this series.

In our previous discussion, we provided an introduction to similarity search and explored two indexing techniques: Flat and IVF. It’s important to note that IVF belongs to the realm of Approximate Nearest Neighbor (ANN) methods, a family of algorithms designed for efficient similarity search. In this article, we will delve into another ANN approach known as HNSW. We can access the previous article here and find accompanying resources such as the Jupyter notebook and sample dataset here.

Hierarchical Navigable Small World (HNSW): HNSW is a standout indexing method in vector databases, renowned for its performance and scalability. It is a combination of two foundational algorithms: the skip list and the navigable small world (NSW). HNSW extends the capabilities of the NSW algorithm by integrating the hierarchical structure found in skip list. This hierarchical integration effectively addresses the scalability challenges of NSW. In HNSW, data points are organized in a multi-layered structure, similar to skip list. The topmost layer contains fewer data points with the longest connections, and as one moves down the hierarchy, the number of elements increases. Searching in HNSW involves starting from a predefined (entry) point at the uppermost layer and traversing…

--

--

Kamal Dhungana

Data scientist with a passion for AI, Regularly blogging about LLM and OpenAI's innovations,Sharing insights for AI community growth