Efficient HNSW Indexing: Reducing Index Build Time Through Massive Parallelism

Introduction

Pat Lasserre
GSI Technology
6 min readJun 21, 2024

--

In the growing field of vector databases, driven by applications like retrieval-augmented generation (RAG) for Generative AI (GenAI) and recommendations for ecommerce, the HNSW (Hierarchical Navigable Small Worlds) algorithm is used by a lot of the top vector database providers for approximate nearest neighbor (ANN) search. It provides fast search with good recall but comes with the tradeoff of long index build time.

Long index build time creates several challenges, including scalability limitations, increased operational costs, reduced experimentation for developers, and slow updates for dynamic datasets.

This blog post explains why building an HNSW index takes a long time, and it presents a solution that leverages a high degree of parallelism to reduce index build time by roughly 85% compared to traditional solutions based on CPU.

HNSW Overview

HNSW is a graph-based algorithm that efficiently searches for the approximate nearest neighbors of a query. The graph has hierarchical layers of nodes (vector embeddings), where each layer contains a subset of the previous layer. Nodes are connected by edges, which represent the distance between them, and distance is measured by a metric such as cosine similarity.

The higher layers have fewer nodes and longer edges between them, enabling the bridging of distant regions of the graph and fast exploration of the vector space The bottom layer includes all the vectors and has short-range edges (connecting to nearby nodes), which allows for more refined searches.

Figure 1 provides a simple example of an HNSW graph structure.

Figure 1 — Example HNSW graph showing the hierarchical layers structure. The top layer has the fewest nodes and longer connections, while the bottom layer contains all the nodes and has short-range edges.

During index building or updating, new nodes are inserted based on their distance from existing nodes. At each layer, a dynamic list of the nearest neighbors for each vertex is maintained, with its size determined by the parameter ef_construction.

The algorithm iterates over the nodes in this list, performing nearest neighbor distance calculations to see if a node’s neighbors are closer to the query. If so, the neighbors are considered for addition to the list.

A larger ef_construction tracks and evaluates more candidates, increasing the likelihood of finding the true nearest neighbors, which improves accuracy but also increases index build time because more distance calculations are needed.

Consequences of Slow Index Build Times

As seen in this previous section, building an HNSW index requires a lot of distance computations to find the nearest neighbors in a hierarchical graph. While this results in low search latency and good recall, it comes with the tradeoff of a graph that is slow to build and update.

For example, as seen in Figure 2, eBay’s experience shows that building an HNSW index for 160 million vectors can take between three to six hours, and that build time increases quickly as the dataset grows.

This can be problematic for a company like eBay that has billions of live listings.

Figure 2- Index build time vs. index size. Different quantization, M, and ef_construction values are used.

Slow index build times can lead to many challenges, such as: scalability issues, increased operational costs, limited developer application experimentation, and slow update of dynamic datasets.

Scalability Issues

Slow index build time restricts the scalability of applications like ecommerce and RAG. For ecommerce, growing customer bases and expanding product catalogs increase index build time, which delays the serving of relevant product recommendations. In RAG, larger datasets allow for higher-quality responses, but slow index build limit the amount of data that can be efficiently managed.

Increased Operational Costs

Slow index build times means computational resources like CPUs and GPUs are in use longer. This increases operational costs, especially in cloud computing environments where costs are based on resource usage

Limited Developer Application Experimentation

Developers need to experiment quickly to fine-tune models and improve application performance. Long index build time restricts the number of experiments they can run and increases the time needed to evaluate those experiments, slowing innovation and delaying improvements.

Slow Update of Dynamic Datasets

Applications like ecommerce and RAG have dynamic datasets that are frequently updated. Slow index build delays the incorporation of new data, leading to outdated recommendations and responses, which can negatively impact user satisfaction and trust.

Ways to Reduce Index Build Time

Two effective ways to reduce index build time are parallel processing and vector quantization.

Parallel Processing

The dataset is split into clusters of vectors and the nearest neighbor distance calculations within those clusters are performed in parallel. Parallel processing significantly reduces the time needed for the distance calculations, which is the longest part of the index build process.

Unfortunately, CPUs, which most HNSW index build solutions use, have limited parallel processing capabilities, so a solution with higher parallel processing capabilities is needed.

Vector Quantization

Vector quantization compresses vectors, enabling more vectors to be packed per data transfer and simplifies nearest neighbor distance calculations by performing them on fewer bits. This reduces the number of memory accesses from slower external memory and speeds up the distance calculations.

Compute-In-Memory (CiM) Associative Processing — Flexible Massive Parallel Processing

GSI Technology’s compute-in-memory Associative Processing Unit (APU) is a technology with millions of bit processors that perform computation at the bit level, which allows for flexible quantization. This enables massive parallel processing on any sized data element.

With the APU, computation is done in place, directly in memory, which avoids the traditional bottleneck between processors and memory.

APU HNSW Index Build Process

1. Quantization: Compress the dataset to a desired bit length (e.g., 4 bits per feature).

2. Clustering: Use K-means clustering to divide the dataset into clusters.

3. Assignment: Assign each vertex to its closest clusters based on distance to the cluster centroids.

4. Load Data: Load multiple clusters into the APU.

5. Nearest Neighbor Search: Perform nearest neighbor search for the vertices assigned to the clusters loaded in the APU.

6. Repeat Steps 4 and 5: Repeat steps 4 and 5 until all clusters have been processed and all vertices inserted into the graph

7. Union of Neighbors: For each vertex, merge nearest neighbors from multiple clusters.

8. Optimization: After the union, make the edges bidirectional and ensure each vertex has <= K neighbors by pruning redundant edges.

The APU uses its millions of bit processors to perform the nearest neighbor distance calculations from step 5 in parallel. This massive parallel processing, along with flexible quantization, significantly reduces the time needed to compute the distance calculations. Performing nearest neighbor distance calculations is the most time-consuming part of building an index, so reducing it will have the greatest impact on speeding up the index build.

Results

Figure 3, from Nvidia, shows that an Intel Xeon Platinum 8480CL CPU takes 5,636 seconds (about 1.5 hours) to build an HNSW index for 100M vectors.

Figure 3 — CPU HNSW index build time for 100M vectors

Figure 4 shows that an APU system can build a 100 million vector HNSW index in 864 seconds (about 15 minutes). This is roughly an 85% reduction in build time compared to the 5,636 seconds (about 1.5 hours) for an Intel Xeon Platinum 8480CL CPU.

Figure 4 also shows that an APU system can build a 1 billion vector HNSW index in roughly 2 hours, which is a dramatic improvement compared to the eBay performance from Figure 2, if it were to be extrapolated to 1 billion vectors.

Figure 4 — HNSW index build times using an APU system for 100 million — 1 billion vectors

Conclusion

HNSW is a leading algorithm used for vector similarity search in applications like GenAI and ecommerce. It provides high recall and fast search, but with the tradeoff of long index build time.

One of the main ways to reduce index build time is through parallelization. Unfortunately, most current solutions use CPUs, which have limited parallel processing capabilities.

Through massive parallel processing and flexible quantization, GSI Technology’s APU reduces index build time by approximately 85% compared to traditional CPU-based solutions, enhancing scalability, lowering operational costs, enabling faster developer application experimentation, and ensuring timely updates for dynamic datasets.

For more details about how the APU’s flexible bit-level processing provides massive parallel processing to significantly reduce index build time, read the white paper Enhancing HNSW Efficiency: Cutting Index Build Time by 85%.

To learn about GSI Technology’s on-prem and cloud-based HNSW index build offerings, contact us at associativecomputing@gsitechnology.com.

--

--