Fusion of graph-based indexing and product quantization for ANN search

Masajiro Iwasaki
8 min readApr 4, 2023

--

Yahoo! JAPAN Research released the approximate nearest neighbor (ANN) search engine NGT as open-source software (OSS) in September 2016. NGT has achieved world-leading performance, and is adopted by a number of Yahoo! JAPAN services through Vald as well. Yahoo! JAPAN Research added Quantized Graph (QG) in January 2021 and Quantized Blob Graph (QBG) in August 2022 to NGT. In this blog post, I will introduce QG.

Performance improvements for graph-based methods are no longer possible?

There are roughly two types of ANN search methods: tree- or graph-based and quantization-based methods. NGT uses a graph-based index (see here for details), and the graph structure has a significant impact on its performance. ONNG¹ in NGT, which Yahoo! JAPAN Research proposed, has achieved world-leading performance in ANN-benchmarks². Nothing more may be done to improve the performance of graph-based methods?

The search efficiency of graph-based methods tends to be better than that of quantization-based methods, where the search efficiency is defined as the ratio of the number of result vectors to the number of the vectors being computed distances. The bottleneck of graph-based methods is the time to fetch vectors from memory.

Therefore, NGT prefetches vectors to reduce the time (see here for details in Japanese). Another option is to reduce the size of vectors. QG adopts product quantization³ to compress vectors. In general, a product quantization-based search uses an inverted index, but QG fuses the graph with product quantization. Thus, the means of graph exploration is the same as that of a graph-based index of NGT, but the calculation of distances is replaced with product quantization.

Vector quantization

Given 2-D vectors (black dots) like the following left figure, each vector in its space, which is separated at regular intervals, is represented with the center vectors (blue dots). In this case, all of the vectors are individually represented with only four numbers (0–3), that is, two bits. This is known as vector quantization.

Note that the distance between the center and individual vectors is the quantization error.

In general vector quantization, the space is divided into clusters by k-means clustering all of the vectors to reduce the quantization error. The upper right figure shows the result of k-means clustering, where each vector is located close to the center vectors (blue dots).

While this vector quantization can reduce the quantization error by increasing the number of clusters, the data size increases because the digit representing the center vectors increases. Thus, when the space is divided into tens of thousands of clusters, at least two bytes are needed to represent it. Product quantization, which is explained next, can reduce both the quantization error and data size.

Product quantization

Product quantization divides a vector into multiple subvectors and applies vector quantization to each subvector. The following figure shows an example of product quantization where an eight-dimensional vector is divided into four two-dimensional vectors. The size of the vector in this figure is 32 bytes because one dimension is a four-byte floating point.

The vector is compressed into one byte-quantized vector by product quantization because one dimension is two bits. This means that the vector is compressed to 1/32. In addition, product quantization can significantly reduce the quantization error through multiple vector quantizations.

In general product quantization, the number of clusters ranges from hundreds to thousands, but the number of QG’s clusters is 16. The difference will be explained later.

While it is not possible to reconstruct the original vector from the quantized vector, it is possible to construct an approximate vector instead. The quantization error between the original and approximate vectors also reduces the search accuracy.

Reducing computation time for approximate distance by product quantization

The time to fetch approximate vectors from memory can be reduced because product quantization compresses the size of the vectors.
However, the additional process to generate approximate vectors from quantized vectors increases the time to calculate the approximate distances.

The product quantization method does not calculate the approximate distances from each approximate vector but calculates them in batches from the partial pre-computed distances (to be precise, the square of the partial distance) in each subvector space.

Thus, it reduces the total computational time. When the search is initiated, first, all distances between the specified query vector and all centroids in each subvector space are calculated to construct the lookup table of the partial distances as shown in the following figure.

The approximate distance can be calculated by summing the partial distances referenced from the lookup table. When calculating the distance between the vectors (green dots in the figure) and the query vector (red dots), the approximate distance can be calculated by obtaining partial distances from the corresponding elements of the lookup table and summing these partial distances. The more vectors there are in the space, the shorter the total calculation time is compared with calculating the distance for each vector because the approximate distance can be simply calculated by summing the distances from the table.

Fusion of graph exploration and product quantization

The approximate distance can be calculated at high speed by product quantization as previously mentioned. Even though it is at high speed, calculating distances for millions of vectors takes much time. After filtering candidate vectors expected to be close to the query, the approximate distances for the vectors are calculated. General product quantization methods filter the vectors using the clusters generated by clustering all of the vectors in advance. This means that neighboring clusters are selected and the approximate distances for the vectors in the clusters are calculated.

The NGT graph is constructed by connecting neighboring vectors. The graph is explored to find vectors close to the specified query as shown in the following figure.

When exploring the graph, it is necessary to find the closest vector to the query among the vectors connected to the vector to be explored. This means that distances between the multiple vectors and the query are necessary. Product quantization is used only for this approximate distance calculation.

Let the red vector be a target vector to be explored in the following figure, approximate distances of the vectors within the red line are calculated using product quantization in a batch. In the same way, let the blue vector be a target vector to be explored, and the approximate distances of the vectors within the blue line are calculated in a batch. Even though, in this figure, few neighboring vectors are connected, in the actual graph, more than ten vectors are connected.

Note that for readers familiar with product quantization, general product quantization uses the residual vectors between the centroid of the cluster filtering from all vectors and the vectors in the cluster, but QG does not use the residual vectors to shorten the query time.

Accelerating approximate distance calculation

Using the lookup table accelerates the approximate distance calculation; however, the time to fetch distances from the lookup table is a bottleneck. Therefore, QG adopts a method to reduce the fetch time by compressing the distances on the lookup table into one byte each, and to accelerate the approximate distance calculation by using SIMD instructions. The number of clusters for each subvector must be set to 16, which is the maximum number of elements in the lookup table referenced by the SIMD instruction.

Search performance evaluation

The search performances of QG indices were measured using ANN-benchmarks with a five-hour time limit. The evaluation results of PANNG, ONNG and QG of NGT, and other main methods (ScaNN, HNSW and annoy) on a number of datasets are shown in the following. The horizontal and vertical axes are the recall and the number of queries per second, respectively. The more the line goes to the upper right in the figure, the higher the performance. NGT-qg in the figures is the QG of this article. Although QG does not surpass all of the others, it shows higher performance in most of the figures.

Usage

Please see here for NGT installation.

Building the QG graph has two steps. First, build an ordinary graph-based index of NGT because QG applied product quantization only to the distance calculation for the NGT graph-based indices. Then, build a product quantization-based index.

Before building the QG index, prepare a million sample vectors as the following.

$ curl -L -O https://github.com/yahoojapan/NGT/raw/main/tests/datasets/ann-benchmarks/sift-128-euclidean.tsv

Next, build a graph-based index (ANNG) as the following. In this example, 128 dimensions and the Euclidean distance are specified. Reconstructing ONNG from ANNG improves the performance, but the ONNG reconstruction is omitted in this example.

$ ngt create -d 128 -D 2 anng sift-128-euclidean.tsv

Note that it is already possible to search using this graph-based index.

$ ngt search anng query.tsv

Then, initialize a QG index. An empty QG index is created in the directory anng.

$ qbg create-qg anng

Build the QG index.

$ qbg build-qg anng

Then, it is possible to search using this QG index. In this example, the epsilon is specified to be the same as the search result using the aforementioned ANNG index.

$ qbg search-qg -e 0.06 anng query.tsv

This blog post explains how QG improves performance and how to build and search a QG index. QBG will be introduced in the next blog post.

[1]: Iwasaki, M., & Miyazaki, D. (2018). Optimization of indexing based on k-nearest neighbor graph for proximity search in high-dimensional data. arXiv preprint arXiv:1810.07355.

[2]: Aumüller, M., Bernhardsson, E., & Faithfull, A. (2020). ANN-Benchmarks: A benchmarking tool for approximate nearest neighbor algorithms. Information Systems, 87, 101374.

[3]: Jegou, H., Douze, M., & Schmid, C. (2010). Product quantization for nearest neighbor search. IEEE transactions on pattern analysis and machine intelligence, 33(1), 117–128.

[4]: André, F., Kermarrec, A. M., & Le Scouarnec, N. (2017, June). Accelerated nearest neighbor search with quick adc. In Proceedings of the 2017 ACM on International Conference on Multimedia Retrieval (pp. 159–166).

[5]: Iwasaki, M. (2016). Pruned bi-directed k-nearest neighbor graph for proximity search. In Similarity Search and Applications: 9th International Conference, SISAP 2016, Tokyo, Japan, October 24–26, 2016, Proceedings 9 (pp. 20–33). Springer International Publishing.

--

--