How to Connect to Milvus Lite Using LangChain and LlamaIndex

5 min readJun 7, 2024

Milvus Lite, released just one week ago on May 31, is now the default method for third-party connectors like LangChain and LlamaIndex to connect to Milvus, the popular open-source vector database.

MethodControl Level for Retrieval ProcessTime (seconds) LlamaIndexNo control2156 LangChainFull control8 Milvus Lite APIFull control28

Table: Timings using the same HuggingFace embedding model (BAAI/bge-large-en-v1.5) and the same HTML data files.

The result? If you’re looking for the best balance between high control over Milvus settings and fast setup, using the Milvus Lite APIs directly is the optimal choice. The full code and timings are available on my GitHub.

In the following sections, we’ll cover:

Connecting to Milvus Lite using LlamaIndex
Connecting to Milvus Lite using LangChain
Connecting to Milvus Lite using Milvus APIs

Connecting to Milvus Lite Using LlamaIndex

It’s easy to get started using LlamaIndex. It takes about 2000 seconds to connect and create a collection.

from pymilvus import MilvusClient
from llama_index.core import (
   Settings,
   ServiceContext,
   StorageContext,
   VectorStoreIndex,
)
from llama_index.embeddings.huggingface import HuggingFaceEmbedding
from llama_index.vector_stores.milvus import MilvusVectorStore


# 1. Define the embedding model.
service_context = ServiceContext.from_defaults(
   # LlamaIndex local: translates to the same location as default HF cache.
   embed_model="local:BAAI/bge-large-en-v1.5")
# LlamaIndex hides this but we need it to create the vector store!
EMBEDDING_DIM = 1024


# 2. Create a Milvus collection from the documents and embeddings.
milvus_client = MilvusClient()
vector_store = MilvusVectorStore(
   client=milvus_client,
   dim=EMBEDDING_DIM,
   overwrite=True
)
storage_context = StorageContext.from_defaults(
   vector_store=vector_store
)
llamaindex = VectorStoreIndex.from_documents(
   # Chunk, embed, insert too slow!  Just use one document.
   docs[:1],
   storage_context=storage_context,
   service_context=service_context
)

Connecting to Milvus Lite Using LangChain

It’s easy to get started in LangChain. It takes about 8 seconds to connect and create a collection.

from langchain_milvus import Milvus
from langchain_huggingface import HuggingFaceEmbeddings
from langchain.text_splitter import RecursiveCharacterTextSplitter


# 1. Define the embedding model.
model_name = "BAAI/bge-large-en-v1.5"
model_kwargs = {'device': 'cpu'}
encode_kwargs = {'normalize_embeddings': True}
embed_model = HuggingFaceEmbeddings(
   model_name=model_name,
   model_kwargs=model_kwargs,
   encode_kwargs=encode_kwargs)
EMBEDDING_DIM = embed_model.dict()['client'].get_sentence_embedding_dimension()


# 2. Create a Milvus collection from the documents and embeddings.
start_time = time.time()
vectorstore = Milvus.from_documents(
   documents=docs,
   embedding=embed_model,
   connection_args={
       "uri": "./milvus_demo.db",},
   # Override LangChain default values for Milvus.
   consistency_level="Eventually",
   drop_old=True,
   index_params = {
       "metric_type": "COSINE",
       "index_type": "AUTOINDEX",
       "params": {}}
)

Connecting to Milvus Lite Using Milvus Lite APIs

But what’s happening behind the scenes? Let’s break down the actual steps and make the default values more explicit:

Start the Milvus Lite server and connect.
Select an embedding model.
Create a Milvus database collection.
Define a schema.
Choose an index (data structure for Approximate Nearest Neighbor search).
Choose a distance metric (definition of “close” in vector space).
Choose the consistency level for inserting data.
Select a chunking strategy.
Transform chunks of data into vectors using the embedding model inference.
Insert vector data into Milvus.

Here is the Python code using the Milvus Lite API directly. It takes about 28 seconds to connect and create a collection.

import pymilvus


# STEP 1. CONNECT A CLIENT TO LIGHT MILVUS PYTHON SERVER.
from pymilvus import MilvusClient
mc = MilvusClient("milvus_demo.db")


# STEP 2. DOWNLOAD AN OPEN SOURCE EMBEDDING MODEL.
from sentence_transformers import SentenceTransformer
model_name = "BAAI/bge-large-en-v1.5"
encoder = SentenceTransformer(model_name, device=’cpu’)


# STEP 3. CREATE A MILVUS COLLECTION AND DEFINE THE DATABASE INDEX.
# Uses Milvus AUTOINDEX, which defaults to HNSW.
COLLECTION_NAME = "MilvusDocs"
mc.create_collection(COLLECTION_NAME,
       EMBEDDING_DIM,
       consistency_level="Eventually",
       auto_id=True, 
       overwrite=True,)


# STEP 4. CHUNK DATA INTO VECTORS.
from langchain_community.document_transformers import BeautifulSoupTransformer
from langchain.text_splitter import RecursiveCharacterTextSplitter
# Define chunk size and overlap.
chunk_size = 512
chunk_overlap = np.round(chunk_size * 0.10, 0)
# Split the documents into recursive, overlapping chunks.
child_splitter = RecursiveCharacterTextSplitter(
   chunk_size = chunk_size,
   chunk_overlap = chunk_overlap,
   length_function = len,  # use built-in Python len function)
chunks = child_splitter.split_documents(docs)


# STEP 5. TRANSFORM CHUNKS INTO VECTORS USING EMBEDDING MODEL INFERENCE.
list_of_strings = [doc.page_content for doc in chunks if hasattr(doc, 'page_content')]
embeddings = torch.tensor(encoder.encode(list_of_strings))


# STEP 6. INSERT CHUNK LIST INTO MILVUS.
# First, create chunk_list and dict_list.
dict_list = []
for chunk, sparse, dense in zip(chunks, embeddings["sparse"], embeddings["dense"]):
   chunk_dict = {
       'chunk': chunk.page_content,
       'source': chunk.metadata.get('source', ""),
       'vector': dense
   }
   dict_list.append(chunk_dict)
mc.insert(
   COLLECTION_NAME,
   data=dict_list,
   progress_bar=True)