Multimodal Search with ChromaDB and OpenCLIP
Multimodal Data are the data captured in multiple format which includes Images, Videos, Audios, Texts and so-on. In this blog, I will show you how to add Multimodal Data in a vector database using ChromaDB in this case. I will be using OpenCLIP for the embeddings. We will then perform query search for visual data. I have used Matplotlib for displaying the results. Now, let’s dive into the code.
Setting Up Your Environment and Installing Libraries
Lets install the libraries that we need. These libraries include:
- ChromaDB: A vector database that supports persistent storage.
- Matplotlib: For visualizing images.
- Pillow: For image processing.
- OpenCLIP: For generating embeddings from images.
%pip install - upgrade chromadb
%pip install matplotlib
%pip install pillow
%pip install open-clip-torch
Importing ChromaDB and Loading Data
Here, we import the necessary modules and initialize them.
import chromadb
from chromadb.utils.embedding_functions import OpenCLIPEmbeddingFunction
from chromadb.utils.data_loaders import ImageLoader
from matplotlib import pyplot as plt
# Initialize ChromaDB client
chroma_client = chromadb.PersistentClient(path="my_vectordb")
# Initialize ImageLoader and EmbeddingFunction
image_loader = ImageLoader()
multimodal_ef = OpenCLIPEmbeddingFunction()
Creating the Vector Database and adding datas
Now, we create a collection in the vector database and add the images.
# Create the vector database collection
multimodal_db = chroma_client.get_or_create_collection(
name="multimodal_db",
embedding_function=multimodal_ef,
data_loader=image_loader
)
# Add images to the database
multimodal_db.add(
#ensure each image has an unique id.
ids=['0', '1', '2', '3', '4', '5', '6', '7', '8', '9'],
uris=[
'images/buildin.jpeg', 'images/glass1.jpeg', 'images/groupsd.jpeg',
'images/lady.jpg', 'images/manlaughing.jpg', 'images/tiger.jpg',
'images/worke.jpeg', 'images/man3.jpeg', 'images/girls2.png',
'images/frog.jpg'
],
#the metadatas below contains imagecategory only , we can add more variables.
metadatas=[
{'img_category': 'house'}, {'img_category': 'utensils'}, {'img_category': 'people'},
{'img_category': 'people'}, {'img_category': 'people'}, {'img_category': 'animals'},
{'img_category': 'people'}, {'img_category': 'people'}, {'img_category': 'people'},
{'img_category': 'animals'}
]
)
Create a displaying function
We have created a print_query_results() function to display the result. We have use Matplotlib and its function to plot the image datas.
def print_query_results(query_list: list, query_results: dict) -> None:
result_count = len(query_results['ids'][0])
for i in range(len(query_list)):
print(f'Results for query: {query_list[i]}')
for j in range(result_count):
id = query_results["ids"][i][j]
distance = query_results['distances'][i][j]
data = query_results['data'][i][j]
metadata = query_results['metadatas'][i][j]
document = query_results['documents'][i][j]
uri = query_results['uris'][i][j]
print(f'id: {id}, distance: {distance}, document: {document}')
print(f'data: {uri}')
plt.imshow(data)
plt.axis("off")
plt.show()
print_query_results(query_texts, query_results)
Now Let’s Perform some Queries !
- I have searched ‘frog’. Now, from the Vector Database, This finds the most relevant image data which is near to the word ‘frog’. Here is the magic of multimodality. The image and text which are similar stays near in the vector space, so the image whose distance is less shows up first.
query_texts = ['frog']
query_results = multimodal_db.query(
query_texts = query_texts,
n_results=2,
include=['documents', 'distances', 'metadatas', 'data', 'uris'],
where={'img_category':'animals'}
)
print_query_results(query_texts, query_results)
And the tiger shows up after the frog, as its distance is more than a frog
2. Similarly, I queried “ green tshirt ”. This is what I got.
And then only other relevant image :
I hope I made it clear how things work here. We will also see about Multimodal RAG on the next blog !