Retrieval in LangChain: Part 3— Text Embeddings and Vector Stores

4 min readMar 23, 2024

Welcome to the third article of the series, where we explore Retrieval in LangChain. If you haven't checked out the previous articles from this series, here it goes Document Loaders and Text Splitters.

Now that we know how to load the documents and chunk the text, what are we waiting for? We need to understand how to transform this textual data into numerical representations. Let’s delve into the text-embedding capabilities of LangChain in this article.

Why do we need embeddings?

Embeddings are numerical representations of texts in a multidimensional space that can be used to capture semantic meanings and contextual information and also perform information retrieval.

It is very simple to get the embeddings for multiple texts and single queries using any embedding model.

from langchain.embeddings import OpenAIEmbeddings

embedding_function = OpenAIEmbeddings()

embedded_docs = embedding_function.embed_documents(docs) #embeds multiple texts
embedded_text  = embedding_function.embed_query(query_text) #embeds a single query

OpenAI model produces vectors of dimension 1536 whereas it differs for other models.

Various embedding models perform better than the OpenAIEmbedding model such as the BGE model created by the BAAI on HuggingFace is the best open-source embedding model.

from langchain.embeddings import HuggingFaceBgeEmbeddings

model_name = "BAAI/bge-base-en-v1.5"
model_kwargs = {"device": "cpu"}
encode_kwargs = {"normalize_embeddings": True}

hf = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

embedded_docs = hf.embed_documents(docs)
embedded_text  = hf.embed_query(query_text)

If you have restrictions on the hardware, go for Fake Embeddings from LangChain for any testing purposes.

from langchain_community.embeddings import FakeEmbeddings
fake_embeddings = FakeEmbeddings(size=300) # Define the embedding size

fake_embedded_record = fake_embeddings.embed_query(query)
fake_embedded_records = fake_embeddings.embed_documents(text)

Once you get the embeddings of your query and the text, store them and search for the similar embedded text to the embedded query to retrieve the required information. This can be done using a vector store which will store the embeddings and perform the search.

Let’s quickly create a vector store from scratch. Start with loading a document, performing a split to get the chunks, creating embeddings, storing the embeddings, and querying the database.

pip install chromadb qdrant-client faiss-cpu

from langchain_community.document_loaders import WikipediaLoader
from langchain.text_splitter import RecursiveCharacterTextSplitter
from langchain.embeddings import HuggingFaceBgeEmbeddings
from langchain_community.vectorstores import FAISS

#Document Loading
loader = WikipediaLoader(query=title, load_max_docs=5)
documents = loader.load()

#Text splitting
text_splitter = RecursiveCharacterTextSplitter(chunk_size=400, chunk_overlap=100)
docs = text_splitter.split_documents(documents=documents)

#Defining the embedding function
model_name = "BAAI/bge-large-en-v1.5"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {"normalize_embeddings": True}

embedding_function = HuggingFaceBgeEmbeddings(
    model_name=model_name,
    model_kwargs=model_kwargs,
    encode_kwargs=encode_kwargs
)

#Creating a vector database (FAISS - in memory database)
db = FAISS.from_documents(
    docs,
    embedding_function
)

#Querying the vector database
matched_docs = db.similarity_search(query=query, k=5)

The Vector DB used here is FAISS (Facebook AI Similarity Search) which is an in-memory database that doesn't store anything in the local memory whereas ChromaDB runs on your local machine

import chromadb
from langchain.vectorstores import Chroma

db = Chroma.from_documents(docs, embedding_function, persist_directory=output_path)
laoded_db = Chroma(persist_directory=output_path, embedding_function=embedding_function)

matched_docs = db.similarity_search(
    query = query,
    k = 5
)

On top of the stored vector database, we can add new information to the same by passing the new information to the vector database.

db = Chroma.from_documents(
    family_docs, # The new docs that we want to add
    embedding_function, # Should be the same embedding function
    persist_directory=output_path # Existing vectorstore where we want to add the new records
)

matched_docs = db.similarity_search(query=query, k=5)

We can also delete any specific information using db._collection.delete(ids=[]). The vector store can be used to create a retriever as well.

retriever = db.as_retriever()
matched_docs = retriever.get_relevant_documents(query=query)

To optimize the retrieval process, we can use the parameter search_type.

Maximum Marginal Relevance (MMR) optimizes for similarity to query and diversity among selected documents.

# Maximal Marginal Relevance

retriever = db.as_retriever(search_type='mmr', search_kwargs={"k": 1})
matched_docs = retriever.get_relevant_documents(query=query)

similarity_score_threshold can be used to get the top relevant results based on a score.

retriever = db.as_retriever(search_type="similarity_score_threshold", search_kwargs={"score_threshold": 0.5})
matched_docs = retriever.get_relevant_documents(query=query)

That’s all about vector stores. This is a straightforward approach to creating a vector store and retrieving relevant information. In our next article, let's deep dive into the more advanced methods of retrieval.

Thanks for reading. If you have any specific questions or need further clarification on any part of the article, feel free to ask!

Reference:https://python.langchain.com/docs/modules/data_connection/text_embedding/

Retrieval in LangChain: Part 3— Text Embeddings and Vector Stores

Why do we need embeddings?

Written by Sushmitha