Image Search Application | Clip OpenAI model

Vishwajeetv
3 min readApr 25, 2024

--

In today’s digital age, image search applications have become an integral part of our lives. Whether it’s finding similar products, identifying objects, or searching for inspiration, image search tools play a crucial role. In this tutorial, we’ll explore how to build a powerful image search application using OpenAI’s CLIP (Contrastive Language-Image Pre-training) model. CLIP enables us to search for images using natural language queries, revolutionizing the way we interact with visual data

A full application demo is available below and this is YouTube link https://youtu.be/GV-DqJ9m_RM

The first question that comes to your mind would be “What is CLIP?”

CLIP is a state-of-the-art model developed by OpenAI that learns to understand images and text jointly. Unlike traditional image search models that rely solely on image features, CLIP can associate images with natural language descriptions. This unique capability allows CLIP to perform tasks such as image classification, zero-shot learning, and most importantly, image retrieval based on textual queries.

Setting Up the Environment: Before we dive into building our image search application, let’s set up our environment. We’ll need the following components:

  1. Python environment with libraries such as PyTorch, NumPy, and PIL.
  2. OpenAI’s CLIP model.
  3. Image dataset for testing our application.
import clip
import torch
import numpy as np
import os
import glob
from pathlib import Path
from PIL import Image
  1. Loading the CLIP Model: We begin by loading the CLIP model, which will serve as the backbone of our image search engine. We can choose from various pre-trained versions of CLIP depending on our computational resources and requirements.
device = "cuda" if torch.cuda.is_available() else "cpu"
model, preprocess = clip.load("ViT-B/32", device=device)p

2. Loading Images: Next, we load the images from a specified folder in our dataset. This folder contains the images that we want to include in our search index Not only that but you can also choose some other folder from your OS, You can see it in the demo.

image_path = Path("../images")
images = glob(str(image_path / "*"))

3. Creating Image Embeddings: Using the CLIP model, we encode each image in our folder into a high-dimensional vector representation known as an embedding. These embeddings capture semantic information about the images, enabling us to perform similarity calculations.

image_embedding = []
for i in images:
image = preprocess(Image.open(i)).unsqueeze(0).to(device)
with torch.no_grad():
image_features = model.encode_image(image).cpu().numpy().tolist()
image_embedding.append(image_features)

4. Setting Up a Database: We utilize a database to store the image embeddings along with their corresponding metadata (e.g., file paths). For this tutorial, we’ll use ChromaDB, a lightweight database optimized for similarity search.

import chromadb
chroma_client = chromadb.Client()
image_collection = chroma_client.create_collection("image" , metadata={"hnsw:space": "cosine"})

5. Querying with Text: Now comes the exciting part! We allow users to input natural language queries, which are converted into text embeddings using CLIP. We then compare these embeddings with the image embeddings in our database to retrieve relevant images.

text_input = input("Enter text: ")
text_embedding = clip.tokenize(text_input).to(device)
text_features = model.encode_text(text_embedding).detach().cpu().numpy()
result = image_collection.query(text_features, n_results=2)
for i in result['metadatas'][0]:
print(i['name'])
print(i['path'])
img = Image.open(i['path'])
img.show()

In this tutorial, we’ve learned how to harness the power of OpenAI’s CLIP model to build a sophisticated image search application. By combining the capabilities of computer vision and natural language processing, we’ve created a tool that can understand both images and text, opening up endless possibilities for visual search applications. Whether it’s for e-commerce, content discovery, or creative projects, CLIP empowers developers to build intelligent image search systems that redefine how we explore and interact with visual content.

To explore the full code and implementation details, check out the GitHub repository.

Working Demo Of Application

--

--