Comparing RAG Part 2: Vector Stores; FAISS vs Chroma

4 min readJan 1, 2024

In this study, we examine the impact of two vector stores, FAISS (https://faiss.ai) and Chroma, on the retrieved context to assess their significance. The investigation utilizes the suswiki knowledge base, segmented into 200-character text chunks, to address 50 carefully selected questions. A comparative analysis is conducted between the FAISS and Chroma vector stores, with the results being assessed using context precision and recall scores provided by the RAGAS evaluator.

The entire set of text chunks comprises 5551 data points. Notably, FAISS requires 72.4 seconds to generate its vector index, while Chroma takes 91.59 seconds to accomplish the same task. These findings contribute valuable insights into the efficiency and performance of these vector stores in the given context.

Retrieve One Context Document

Retrieving the single most similar context document prioritizes simplicity and speed. This method offers the added advantage of facilitating easy identification and presentation of differences when they arise.

Experiment Results

FAISS vs Chroma when retrieving 50 questions

As indicated in Table 1, despite utilizing the same knowledge base and questions, changing the vector store yields varying results. However, the differences in numbers are marginal. The same table also highlights FAISS’s superiority over Chroma in both context precision and recall. FAISS is also faster in terms of similarity search, taking only 1.81 seconds to retrieve 50 contexts from 50 questions, while Chroma lags behind with 2.18 seconds.

Differences in retrieved contexts

Among the 50 questions, 5 questions have different retrieved contexts from FAISS and Chroma. These discrepancies cause variations in context precision and recall. In 4 of these cases, FAISS returned accurate contexts, while Chroma missed the target. Interestingly, in 1 question where RAGAS favored Chroma, manual evaluation revealed that neither FAISS nor Chroma returned the correct context.

Retrieve Multiple Context Documents

Additionally, we conducted tests retrieving multiple documents to see the impact on precision, recall, and f-measure. We chose to retrieve 3 documents as our hyper-parameter, due to its popularity in open source RAG projects. This exploration aims to investigate whether increasing the number of retrieved contexts helps the retriever to improve its output where the correct documents was not retrieved in the single-document experiment. After that, we tested with 6 documents, as to further increase the effect.

Experiment Results

In Table 2, there is a slight improvement in FAISS scores compared to retrieving a single document, with the f-measure rising from 0.95 to 0.97. Conversely, Chroma’s f-measure decreased significantly from 0.91 to 0.73. Interestingly, the search time for both FAISS and Chroma remains almost the same as in the previous experiment.

When retrieving 6 documents, the overall scores remain nearly identical to retrieving 3 documents, with a mere 2% decrease in FAISS f-measure and a 3% increase in Chroma f-measure. Intriguingly, it takes FAISS less time to retrieve 6 documents than 3.

Conclusion

In this experiment, utiliting 5551 text-chunks and 50 questions to compare 2 vector stores, FAISS and Chroma. Subsequently, we increased the number of document retrieved in order to observe its impact on RAGAS evaluation score. Our findings indicate the superiority of FAISS over Chroma in terms of speed and retrieval accuracy, with Chroma experiencing decreased accuracy as the number of retrieved documents increases.

Changing the vector store influences the retrieved context due to the similarity search algorithm implemented in the vector store. Therefore, vector store choice emerges as a crucial factor in the RAG model. The comparison between FAISS and Chroma reveals that FAISS is faster in the initialization task and correctly retrieved contexts that are mistaken in Chroma, as shown by the higher score in context precision and recall. Moreover, FAISS has also faster search algorithm, making it the preferred choice over Chroma.

An attempt to increase the number of retrieved documents from 1 to 3 and 6 results in reduced precision and recall for Chroma, with only marginal improvement for FAISS. Fortunately, the search times across all experiments remain consistently similar.

Why is there discrepancy between vector stores?

The non-deterministic nature of Chroma’s algorithm for similarity search is attributed to its use of the approximate nearest neighbor (ANN) algorithm, specifically Hierarchical Navigable Small World (HNSW). During the search process, it involves the exclusion of some data points.

Contrastingly, FAISS employs two components: (1) Product quantization (PQ) encoding and (2) a search system with inverted indexing [1]. PQ introduces techniques for compressing high-dimensional vectors with a lossy approach. In the current implementation, it also uses HNSW as the search system. But, unlike the standard HNSW, this combination enhances search efficiency, resulting in faster search and more relevant contexts to the query.

Reference

[1] H. Jégou, M. Douze, and C. Schmid, “Product quantization for nearest neighbor search,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 33, no. 1, pp. 117–128, 2011, INRIA-00514462v2. doi: 10.1109/TPAMI.2010.57. [Online]. Available: https://inria.hal.science/inria-00514462/document.

Comparing RAG Part 2: Vector Stores; FAISS vs Chroma

Retrieve One Context Document

Experiment Results

Differences in retrieved contexts

Retrieve Multiple Context Documents

Experiment Results

Conclusion

Why is there discrepancy between vector stores?

Reference

Written by Stepkurniawan