Arjun Bali
6 min readAug 11, 2023

Faiss ( Facebook AI Search similarity) — A Quick Introduction

The Visual interpretation of what happens under the hood

Imagine you own a vast library with millions of books. One day, a friend comes over and describes a story they read a long time ago. They can’t remember the title, the author, or even the main characters’ names. They only remember specific details about the plot. You decide to help them find this book in your library, but with millions of books, where do you start?

This is analogous to searching for similar items in a large dataset. Faiss (Facebook AI Similarity Search) is like a magical librarian who can, in seconds, scan your entire library and find the most similar books based on the details your friend remembered.

Reason for Faiss’s Development:

Modern machine learning and big data applications deal with massive datasets. Consider Deep Learning models that produce high-dimensional vectors, such as image embeddings. Searching for similar vectors (nearest neighbors) in high dimensions can be computationally expensive and time-consuming. Facebook AI recognized this problem and created Faiss, a library specialized in efficient similarity search and clustering of dense vectors.

A Little more depth:

There are multiple types of indexes in FAISS based on how they are built and how the search is carried out :

  • Flat index: This is the simplest type of index. It stores the vectors in a simple list. This index is very fast to create, but it is not very efficient for finding the nearest neighbors of a given query vector. This is a brute force comparison.
  • IVF index(Inverted File Index): This index is more efficient than the flat index for finding the nearest neighbors of a given query vector. It works by dividing the vectors into a set of cells(clusters). Each cell contains a small number of vectors. When we search for the nearest neighbors of a given query vector, Faiss only searches the cells that are likely to contain the nearest neighbors.
  • HNSW index: This index is even more efficient than the IVF index for finding the nearest neighbors of a given query vector. It works by building a graph of the vectors. The graph is such that the vectors that are similar to each other are connected by edges. When we search for the nearest neighbors of a given query vector, Faiss only follows the edges in the graph to find the nearest neighbors.

How the embeddings are added to the index? :
Faiss creates the indexes by iterating over the vectors and adding them to the index one by one. The creation process can be done in parallel, which can significantly speed up the process.

How is the search happening ?
Faiss carries out the search by first finding the nearest neighbors of the query vector in the index. The nearest neighbors are the vectors that have the smallest distances to the query vector. Faiss then returns the top k nearest neighbors, where k is a parameter that we can specify. The search process can be done in parallel, which can significantly speed up the search.

Faiss also supports different search algorithms, each with its own advantages and disadvantages. Some of the most popular search algorithms include:

  • Linear search algorithm: This is the simplest search algorithm. It works by simply iterating over all of the vectors in the index and comparing the distance between the query vector and each vector. This algorithm is very fast, but it is not very efficient for large indexes.
  • Heuristic search algorithm: This algorithm works by using a heuristic to find the nearest neighbors of the query vector. The heuristic is such that it is likely to find the nearest neighbors quickly. This algorithm is more efficient than the linear search algorithm for large indexes, but it is not as fast.
  • Sorted search algorithm: This algorithm works by sorting the vectors in the index by their distance to the query vector. This algorithm is the most efficient for large indexes, but it is not as fast as the heuristic search algorithm for small indexes

Let’s see how it can be implemented in Python:

# Install required libraries
from sentence_transformers import SentenceTransformer
import faiss
import numpy as np
import time
# Generate sentence embeddings
sentences = [
"I love machine learning.",
"Faiss is an incredible library.",
"Neural networks are foundational in deep learning.",
"OpenAI has developed GPT-4.",
"Similarity search can be computationally intensive.",
"TensorFlow and PyTorch are popular deep learning frameworks.",
"Convolutional Neural Networks excel at image classification.",
"Natural Language Processing enables machines to understand language.",
"Reinforcement learning is used in training agents for games.",
"Transfer learning allows us to use pre-trained models effectively.",
"BERT revolutionized the way we handle text embeddings.",
"AI ethics is crucial in the development of autonomous systems.",
"Generative Adversarial Networks can generate synthetic data.",
"Machine learning models require vast amounts of data.",
"Data augmentation techniques can enhance your dataset.",
"Overfitting is a common problem in deep learning.",
"Dropout is a technique to prevent overfitting in neural networks.",
"Hyperparameter tuning is essential for optimal model performance.",
"Cloud platforms offer powerful tools for machine learning deployment.",
"Quantum computing might be the next big thing in technology.",
"Support Vector Machines are used for classification tasks.",
"Ensemble methods combine multiple models for better results.",
"The attention mechanism was a breakthrough in sequence models.",
"Computer vision applications are transforming industries.",
"Self-driving cars use a mix of AI techniques.",
"AI in healthcare can help in early diagnosis of diseases.",
"Bias in datasets can lead to unfair model predictions.",
"Explainable AI aims to make model decisions understandable.",
"Machine Learning Operations (MLOps) is gaining traction.",
"The Internet of Things (IoT) benefits from AI-driven analytics."
]
model = SentenceTransformer('paraphrase-distilroberta-base-v1')
embeddings=model.encode(sentences)
# Create a Faiss flat index
d = embeddings.shape[1]
index_flat = faiss.IndexFlatL2(d)
index_flat.add(embeddings)
# Query
query_sentence = "Tell me more about deep learning techniques."
query_embedding = model.encode([query_sentence])
# Time the search
start_time = time.time()
D_flat, I_flat = index_flat.search(query_embedding, k=2) # k is the number of nearest neighbors
end_time = time.time()
print(f"Flat index search took {end_time - start_time:.6f} seconds.")

In the above code, we have a list of some random sentences related to machine learning and AI and we are trying to create a FLAT index and run a query over it by finding similar vectors. Since the above is a flat index, we don’t really need a quantizer which is going to be used when we see the IVF index (below)

Like I mentioned above, the Flat index is more of a brute force search where the similarity is checked between each pair (index from the knowledge base and query vector).

To have an optimized search and really use the power of FAISS, we build what is called the IVF index where we create clusters using a quantizer, the number of clusters and the distance metric used while creating can be defined while defining the index. The idea behind creating the clusters is that when we do get a query vector we don’t need to compare the distance of each pair( each vector in the knowledge base & the query vector) instead we find the distance of the query vector with the centroids of the clusters and then based on that we can restrict the search space to the ( top 2,3 or whatever ) number closest clusters we want to look into. Once we have the similar clusters, we can compare the distance of the query vector and the vectors inside these clusters only to retrieve the most similar vectors.

Here, we see the query vector xq most similar to the blue cluster centroid
# Number of clusters we want to create on for embeddings
nlist = 2
quantizer = faiss.IndexFlatL2(d)
index_ivf = faiss.IndexIVFFlat(quantizer, d, nlist)
assert not index_ivf.is_trained

# Train the index
index_ivf.train(embeddings)
assert index_ivf.is_trained
#Let's say we want to get top two closest centroids to decide the search
#space then we can define the nprobe
index_ivf.nprobe=2
# Add data to the index
index_ivf.add(embeddings)

# Time the search for IVF
start_time = time.time()
#The below method returns the distances and the index of the k similar docs
D_ivf, I_ivf = index_ivf.search(query_embedding, k=2)
end_time = time.time()

Observations on Time Differences

Typically, an IVF index is faster than a flat index because it avoids comparing the query to every single item in the database. Instead, it first assigns the query to the nearest cluster (based on a coarse quantizer) and then only searches within that cluster. This means we’re comparing against a subset of the data, which makes the search faster.

However, for a very small dataset (like our example sentences), the overhead of the IVF method might overshadow its benefits. In real-world scenarios with large datasets, the efficiency gain from IVF will be more evident.

I have tried to summarize my learnings from the internet and the FAISS documentation, hopefully the above post can help others ! Feel free to leave any feedback.