# Product Quantization

# What

Product quantization is a technique used primarily for vector compression and approximate nearest neighbor search in large-scale datasets. It’s particularly useful in areas such as computer vision, machine learning, and information retrieval, where handling and searching through massive volumes of high-dimensional data efficiently are crucial.

# When

Product quantization proves to be highly beneficial in situations that require handling vast datasets and complex, high-dimensional data, especially when there are constraints on computational power, storage capacity, or available time.

- Large-Scale Image or Video Retrieval

In multimedia databases or content-based retrieval systems, where the goal is to find images or videos similar to a query example, the datasets can be enormous, and the data itself (e.g., features extracted from images or videos) can be very high-dimensional. Product quantization allows for efficient storage and quick retrieval of similar items.

- Recommender Systems

Recommender systems, particularly those dealing with a vast number of items (like products in an online store or movies in a streaming service), can benefit from product quantization. It enables these systems to efficiently find and recommend items that are similar to a user’s past preferences or search queries.

- Natural Language Processing (NLP)

In NLP tasks, such as document retrieval or semantic search, documents or sentences are often represented as high-dimensional vectors (e.g., through TF-IDF vectors or embeddings from models like BERT). Product quantization can help manage the computational and memory demands of searching through large collections of text documents.

- Internet of Things (IoT) and Edge Computing

In IoT applications and edge computing, where data from numerous sensors or devices is collected and processed, often in a resource-constrained environment, product quantization can help in efficiently transmitting, storing, and analyzing high-dimensional data.

# Pseudo Algorithm

# Step 1: Raw Vectors

**Input**: A set of 50,000 vectors, each of size 1,024.**Objective**: Reduce the dimensionality and storage requirements of these vectors while preserving their properties for similarity search.

# Step 2: Splitting into Subvectors

**Process**: Each vector of size 1,024 is split into 8 subvectors, each of size 128 (since 1024÷8=128). 8 is hyperparameter here.**Result**: This results in 8 matrices, each with dimensions 50,000×128.

# Step 3: K-Means Clustering on Subvectors

**Process**: Apply K-means clustering on each of the 8 sets of subvectors. For each set, the algorithm identifies 256 centroids.**Centroid Count**: The number of centroids (256 in this case) is a hyperparameter. Typically, this is defined by the number of bits required to store the centroid ID.**Bit Representation**: If the number of bits is set to 8, then Power(2,8)=256, meaning a maximum of 256 centroid IDs can be handled.**Result**: For each subvector, the closest centroid is identified. Each 128-dimensional subvector is thus represented by the index of its nearest centroid.

# Step 4: Storing Centroid Indices

**Encoding**: Since there are 256 possible centroids, each subvector can be represented by an 8-bit index.**Storage**: The original vector (1,024 bytes) is now represented by 8 centroid indices, resulting in an 8-byte compressed representation for each vector.

# Step 5: Searching for Query Vectors

**Query Vector**: Given a query vector q_v of size 1,024, split it into 8 subvectors of size 128.**Similarity Calculation**: For each subvector of the query, calculate its similarity to all 256 centroids obtained in Step 3. This gives 8 sets of 256 similarity values.**Aggregate Similarity**: For each vector in the database, sum the similarities of its 8 centroid indices with the corresponding subvectors of the query vector.**Ranking**: Sort all vectors based on the aggregated similarity scores to find the most similar vectors to the query.

This method significantly reduces the storage space and allows for efficient approximate nearest neighbor search by comparing compressed representations instead of the original high-dimensional vectors.

# Parameters in PQ

d = vector size

m = no of times vector need to be divided (d%m must be 0), Also it will be equal to no of KMean estimators (Clustering)

nbits = no of bits (say 8)

k = no of clusters in each KMean (2^nbits) — Because there are only 256 centroids, we only need 8-bits to store a centroid id.

# Benefit of Product Quantization

**Reduces Memory Usage**: Product quantization significantly decreases the amount of memory needed to store vectors by compressing the original high-dimensional data into more manageable, compact forms. This compression ensures that each vector requires less storage space.**Accelerates Approximate Nearest Neighbor Search**: By working with compressed versions of the original data, product quantization facilitates faster searches for approximate nearest neighbors. This is crucial for applications requiring real-time or near-real-time responses.**Simplifies Distance Calculations**: The method approximates high-dimensional distance calculations in a way that greatly simplifies these operations, making them less computationally intensive without heavily compromising accuracy.

# Some Other Visuals (Appendix)

ds → Size of each segment

k → No of cluster

Iterating through all m segments and for each segment, training Kmeans clustering algorithm.

Loading trained estimator (KMeans model) for each segment and predicting cluster for each subvector of size ds.

# Reference

- https://www.youtube.com/watch?v=PNVJvZEkuXo
- https://www.youtube.com/watch?v=t9mRf2S5vDI
- Paper: https://lear.inrialpes.fr/pubs/2011/J...
- Code: https://github.com/jankrepl/mildlyove...
- Faiss: https://github.com/facebookresearch/f...
- Nanopq: https://github.com/matsui528/nanopq
- https://mccormickml.com/2017/10/13/product-quantizer-tutorial-part-1/