Similarity Search: ScaNN and 4-bit PQ

ScaNN is a vector similarity search algorithm. This blog post introduces the relationship between ScaNN and 4-bit PQ.

Takuma Yamaguchi (Kumon)
5 min readSep 17, 2021

Introduction

Around 1 year ago (2020), Google published a very impressive blog post and paper.

Scalable Nearest Neighbors (ScaNN) is a vector similarity search algorithm and it is used in Vertex Matching Engine (GCP), which is a managed similarity search service. The vector similarity search field has been studied for many years, so usually the latest state-of-the-art algorithm is slightly better than the previous one.

However, in this case, the ScaNN achieved significantly different results. The QPS is almost double compared to the following algorithm.

https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html

Maximum Inner Product Search (MIPS)

The similarity search is to find the most similar vector to a given query vector. There are some ways to calculate similarities, like inner product, cosine similarity and Euclidean distance.

Intuitively, vector a or b is the closest or similar to the query vector q in the figure above, but vector c is the similar vector based on inner product maximization. ScaNN was developed for MIPS

Vector Quantization in ScaNN

There are 2 major findings in the ScaNN vector quantization.

https://arxiv.org/pdf/1908.10396.pdf

Score Aware Loss: Not all pairs of q and x are equally important. For x, it is more important to accurately quantize the inner product of <q1, x> than <q2, x> or <q3, x>.

https://ai.googleblog.com/2020/07/announcing-scann-efficient-vector.html

Anisotropic Loss: Quantization error can be decomposed to parallel component and orthogonal component. And the parallel component penalizes more than the orthogonal component.

PQ with Score Aware Loss and Anisotropic Loss

Standard vector quantization is not very practical for high dimensional or large scale databases. PQ (product quantization) is a widely used scalable vector quantization method.

https://speakerdeck.com/matsui_528/cvpr20-tutorial-billion-scale-approximate-nearest-neighbor-search?slide=79

In the standard vector quantization, a code book is generated from original vectors. On the other hand, PQ divides vectors into multiple subspaces and for each subspace a code book is generated. This approach allows to handle high dimensional vectors and large scale databases.

ScaNN also uses PQ, so we can say ScaNN is that PQ with score aware loss and anisotropic loss.

Why so fast?

I had a big question why ScaNN is very fast. Using score aware loss and anisotropic loss doesn’t reduce computational complexity. The reason was not in the blog post or the paper. The paper simply said “SIMD based ADC”. SIMD is commonly used in the similarity search field and ADC is asymmetric distance computation, which is described in the PQ paper.

https://github.com/google-research/google-research/blob/master/scann/scann/hashes/internal/lut16_avx512.inc

The answer was in the code. SIMD in-register lookup tables is implemented.

SIMD in-register lookup tables

Using lookup tables for distance computation with PQ is not special, but since the table size cannot be fit to registers (128-bit — 512-bit), realizing in-register lookup tables is not straightforward. I found a paper, “Andre et al., Accelerated Nearest Neighbor Search with Qick ADC”, which addresses the issue.

In PQ, 8-bit sub-quantizer is widely used and distances are represented as 32-bit float, so the table size is 2⁸ * 32 = 8,192 bits. To minimize the table size, 4-bit sub-quantizer and 8-bit integers for distance representations are used in Quick ADC. As a result, the table size became 2⁴ * 8 = 128 bits. It allows to run the lookup tables in SIMD registers.

https://arxiv.org/pdf/1704.07355.pdf

Register access is super faster than main memory access, like more than 10 times, and also 16 lookups are performed in 1 cycle. That’s why Quick ADC is quick. Since the sub quantizer is 4 bits, it’s also called 4-bit PQ.

Benchmarks

Some benchmark results for 4-bit PQ and ScaNN are available in the Faiss wiki. Faiss is a similarity search library, which is developed and maintained by Facebook Research.

https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors#results-on-sift1m

The reorder means that instead of querying the top 10 results in one shot, the top 100 vectors are retrieved and reordered with more accurate distance computation. Judging from these results, 4-bit PQ looks better than ScaNN.

https://github.com/facebookresearch/faiss/wiki/Indexing-1M-vectors#results-on-glove

As for Glove dataset, ScaNN performs better than 4-bit PQ. The dataset is used in the ScaNN paper, so the anisotropic loss quantization works for this dataset. Interestingly, with the reordering, the performance is almost the same.

Conclusion

ScaNN is a vector quantization algorithm for maximum inner product search. The algorithm is a combination of product quantization, score aware loss and anisotropic loss. To accelerate the search speed, ScaNN is implemented with SIMD in-register lookup tables. The performance difference between ScaNN and 4-bit PQ is limited, but a similarity search engine as managed service is super beneficial for many use cases.

--

--