Build a Fashion Buddy Application using Groq, Llama3- Vision Model and FastAPI
Fashion Buddy Project Overview
Fashion Buddy is an innovative web application designed to assist users in exploring and analyzing fashion items through image uploads. Leveraging advanced machine learning models, including CLIP (Contrastive Language-Image Pretraining) and Groq’s Llama 3.2 Vision Model, the application provides users with detailed descriptions and analyses of fashion items based on their uploaded images.
Key Features
1. Image Upload: Users can upload images of fashion items, which the application processes to generate embeddings using the CLIP model.
2. Category Filtering: The application allows users to select a category (Male, Female, Accessories) from a dropdown menu. This selection is used to filter similar items based on metadata stored in the Chroma DB.
3. Similarity Search: Upon uploading an image, the application performs a vector similarity search against a database of fashion items, retrieving similar images along with their associated metadata.
4.Detailed Fashion Analysis: The application generates a comprehensive analysis of the uploaded image and similar items, formatted in markdown. This analysis includes:
— A brief description of the uploaded item.
— Details about similar items, including gender, color palette, style comparisons, outfit combinations, and brief descriptions.
5. User-Friendly Interface: The application features a vibrant and responsive user interface built with DaisyUI and Tailwind CSS, ensuring a seamless user experience.
Technical Stack
— Backend: FastAPI for building the web application and handling API requests.
— Image Embedding : CLIP model for generating image embeddings.
— Vision Model : Groq’s Llama 3.2 Vision Model for generating detailed descriptions.
— Database: Chroma DB for storing and retrieving image embeddings and metadata.
— Frontend: HTML, CSS, and JavaScript for the user interface, with libraries like DaisyUI and marked.js for styling and markdown rendering.
Code Implementation
Fashion Buddy project, which consists of three main components: upload_images.py, app.py, and static/index.html. Each component plays a crucial role in the functionality of the application.
1. upload_images.py
This script is responsible for processing images from specified folders, generating embeddings using the CLIP model, and storing these embeddings along with metadata in a Chroma DB collection.
Key Components:
- Imports: The script imports necessary libraries, including os, torch, PIL for image processing, transformers for the CLIP model, and chromadb for database operations.
- Logging Setup: Configures logging to provide real-time feedback on the script’s execution.
- Model Initialization: Loads the CLIP model and processor from the Hugging Face model hub.
- Function get_image_embedding(image_path):
— — Takes the path of an image as input.Loads the image, processes it, and generates an embedding using the CLIP model.
— — Returns the embedding as a NumPy array.
- Function upload_embeddings_to_chroma(base_folder):
— — Processes images from three categories: Male, Female, and Accessories.
— — Deletes any existing Chroma DB collection named “fashion_images” and creates a new one.
— — Iterates through each category folder, retrieves image files, and generates embeddings.
— — 0Adds the embeddings and associated metadata (filename and category) to the Chroma DB collection.
— — — Logs the number of successful uploads.
- Main Execution Block:
— —Defines the base folder where images are stored.
— — Calls the upload_embeddings_to_chroma function to start processing.
2. app.py
This script serves as the backend for the FastAPI application, handling image uploads, generating embeddings, and returning results to the frontend.
Key Components:
- Imports: Similar to upload_images.py, it imports necessary libraries for FastAPI, image processing, and machine learning.
-FastAPI Initialization: Creates an instance of the FastAPI application.
- Model and Processor Initialization: Loads the CLIP model and processor, as well as the Groq client for generating descriptions.
- Chroma DB Initialization: Connects to the Chroma DB and retrieves the “fashion_images” collection.
- Prompt Template: Defines a prompt for the language model to generate a detailed description of the uploaded image and similar items.
- Function get_image_embedding(image):
— — Processes the uploaded image and generates an embedding using the CLIP model.
- Endpoint /upload:
— — Accepts an image file and a category (Male, Female, Accessories) as input.
— — Generates an embedding for the uploaded image.
— — Performs a vector similarity search in the Chroma DB based on the selected category.
— — Generates a description of the uploaded image using the Groq model.
— — Constructs a detailed analysis of similar images, formatted in markdown.
— — Returns the image description, similar images, analysis, and the uploaded image in base64 format.
— — Static File Serving: Serves static files (HTML, CSS, JS) from the static directory.
— — Main Execution Block: Runs the FastAPI application on the specified host and port.
3. static/index.html
This file contains the frontend of the application, providing a user interface for uploading images and displaying results.
Key Components:
- HTML Structure: Defines the layout of the page, including a title, file upload input, category dropdown, and buttons.
- Category Dropdown: Allows users to select a category (Male, Female, Accessories) for the uploaded image.
- Results Section: Displays the uploaded image, similar images, and the fashion analysis.
- JavaScript Logic:
— — Handles the image upload process, sending the file and selected category to the backend.
— — Receives the response from the server and updates the UI with the uploaded image, similar images, and analysis.
— — Uses the marked library to convert markdown analysis into HTML for display.
Code
- Install required libraries
pip install fastapi uvicorn pillow transformers torch langchain chromadb groq langchain langchain_groq
- Setup your GROQ_API_KEY in .env file
- Setup image folder
- Folder Structure
- upload_images.py
import os
import torch
from PIL import Image
from transformers import CLIPProcessor, CLIPModel
from dotenv import load_dotenv
from chromadb import Client, Settings
from tqdm import tqdm
import logging
# Set up logging
logging.basicConfig(level=logging.INFO, format='%(asctime)s - %(levelname)s - %(message)s')
load_dotenv()
# Initialize CLIP model and processor
model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
# Initialize Chroma client with persistence
chroma_client = Client(Settings(persist_directory="./chroma_db", is_persistent=True))
def get_image_embedding(image_path):
image = Image.open(image_path).convert("RGB")
inputs = processor(images=image, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = model.get_image_features(**inputs)
return outputs.squeeze().numpy()
def upload_embeddings_to_chroma(base_folder):
logging.info(f"Processing images in folder: {base_folder}")
# Recreate the collection
if "fashion_images" in chroma_client.list_collections():
chroma_client.delete_collection("fashion_images")
collection = chroma_client.create_collection("fashion_images")
categories = ['Male', 'Female', 'Accessories']
successful_uploads = 0
for category in categories:
folder_path = os.path.join(base_folder, category)
if not os.path.exists(folder_path):
logging.warning(f"Folder not found: {folder_path}")
continue
image_files = [f for f in os.listdir(folder_path) if f.lower().endswith(('.png', '.jpg', '.jpeg'))]
logging.info(f"Total images found in {category}: {len(image_files)}")
for filename in tqdm(image_files, desc=f"Processing {category}"):
try:
image_path = os.path.join(folder_path, filename)
embedding = get_image_embedding(image_path)
metadata = {
"filename": filename,
"category": category
}
collection.add(
embeddings=[embedding.tolist()],
metadatas=[metadata],
ids=[f"{category}_{filename}"]
)
successful_uploads += 1
except Exception as e:
logging.error(f"Error processing {filename} in {category}: {str(e)}")
logging.info(f"Embeddings added to Chroma DB. Total successful items: {successful_uploads}")
if __name__ == "__main__":
base_folder = "images" # Change this if your images are in a different folder
# Upload embeddings to Chroma DB
upload_embeddings_to_chroma(base_folder)
logging.info("Process completed successfully!")
- app.py
import os
import io
import base64
from fastapi import FastAPI, File, UploadFile, Form
from fastapi.staticfiles import StaticFiles
from fastapi.responses import HTMLResponse, JSONResponse
from PIL import Image
from transformers import CLIPProcessor, CLIPModel
import torch
from langchain_groq import ChatGroq
from dotenv import load_dotenv
load_dotenv()
from langchain.prompts import PromptTemplate
from langchain.chains import LLMChain
from chromadb import Client, Settings
from groq import Groq
app = FastAPI()
# Initialize CLIP model and processor
clip_model = CLIPModel.from_pretrained("openai/clip-vit-base-patch32")
clip_processor = CLIPProcessor.from_pretrained("openai/clip-vit-base-patch32")
# Initialize Groq client
groq_client = Groq()
# Initialize Chroma client with persistence
chroma_client = Client(Settings(persist_directory="./chroma_db", is_persistent=True))
collection = chroma_client.get_collection("fashion_images")
# Initialize LangChain with Groq LLM
llm = ChatGroq(
model_name="llama-3.2-11b-vision-preview",
groq_api_key=os.environ["GROQ_API_KEY"],
temperature=0.7,
)
template = """
Describe the fashion item in the uploaded image in detail in EXACTLY TWO SENTENCES:
{image_description}
Similar items:
{similar_items}
Please provide a detailed analysis of the similar images in the following format:
Fashion Analysis
{similar_items_analysis}
For each similar image, provide the following details in NO MORE THAN 1 SENTENCE each:
- **Colour Palette** (What are the colours of the fashion item in the image?)
- **Style comparison** (How is the style of the fashion item in the image compared to the uploaded image?)
- **Outfit combination suggestion** (What are two different outfit combinations that can be made with the fashion item in the image?)
- **Brief image description** (Describe the fashion item in the similar image in detail in NO MORE THAN 1 SENTENCE.)
Format your response in markdown, using appropriate headers and sub-headers.
"""
prompt = PromptTemplate(template=template, input_variables=["image_description", "similar_items", "similar_items_analysis"])
chain = LLMChain(llm=llm, prompt=prompt)
def get_image_embedding(image):
inputs = clip_processor(images=image, return_tensors="pt", padding=True, truncation=True)
with torch.no_grad():
outputs = clip_model.get_image_features(**inputs)
return outputs.squeeze().numpy()
@app.post("/upload")
async def upload_image(file: UploadFile = File(...), category: str = Form(...)):
contents = await file.read()
image = Image.open(io.BytesIO(contents)).convert("RGB")
# Get embedding for uploaded image
embedding = get_image_embedding(image)
# Perform vector similarity search with metadata filtering
results = collection.query(
query_embeddings=[embedding.tolist()],
n_results=5,
where={"category": category}
)
# Get similar image information
similar_images = results['metadatas'][0]
# Generate image description using Groq Llama 3.2 Vision Model
image_bytes = io.BytesIO()
image.save(image_bytes, format='PNG')
image_bytes = image_bytes.getvalue()
base64_image = base64.b64encode(image_bytes).decode('ascii')
chat_completion = groq_client.chat.completions.create(
model="llama-3.2-11b-vision-preview",
messages=[
{
"role": "user",
"content": [
{"type": "text", "text": f"You are a FASHION EXPERT. Your task is to analyze the image provided, which is a {category} fashion item. Describe this fashion item in detail."},
{
"type": "image_url",
"image_url": {
"url": f"data:image/png;base64,{base64_image}"
}
}
]
}
]
)
image_description = chat_completion.choices[0].message.content
# Generate analysis using LangChain
similar_items_desc = "\n".join([f"- {item.get('filename', 'Unknown')} (Category: {item.get('category', 'Unknown')})" for item in similar_images])
similar_items_analysis = "\n".join([f"{i+1}. {item.get('filename', 'Unknown')} (Category: {item.get('category', 'Unknown')})" for i, item in enumerate(similar_images)])
analysis = chain.invoke({
"image_description": image_description,
"similar_items": similar_items_desc,
"similar_items_analysis": similar_items_analysis
})
return JSONResponse({
"image_description": image_description,
"similar_images": similar_images,
"analysis": analysis['text'],
"uploaded_image": f"data:image/png;base64,{base64_image}"
})
# Serve static files (HTML, CSS, JS)
app.mount("/static", StaticFiles(directory="static"), name="static")
app.mount("/images", StaticFiles(directory="images"), name="images")
@app.get("/", response_class=HTMLResponse)
async def read_root():
with open("static/index.html", "r") as f:
return f.read()
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
- index.html
<!DOCTYPE html>
<html lang="en" data-theme="fantasy">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Fashion Buddy</title>
<link href="https://cdn.jsdelivr.net/npm/daisyui@3.1.0/dist/full.css" rel="stylesheet" type="text/css" />
<script src="https://cdn.tailwindcss.com"></script>
<script src="https://cdn.jsdelivr.net/npm/marked/marked.min.js"></script>
</head>
<body>
<div class="container mx-auto p-4">
<h1 class="text-4xl font-bold text-center mb-8">Fashion Buddy</h1>
<div class="flex flex-col items-center">
<input type="file" id="imageUpload" accept="image/*" class="file-input file-input-bordered file-input-primary w-full max-w-xs mb-4" />
<select id="categorySelect" class="select select-bordered w-full max-w-xs mb-4">
<option disabled selected>Select a category</option>
<option value="Male">Male</option>
<option value="Female">Female</option>
<option value="Accessories">Accessories</option>
</select>
<button id="uploadButton" class="btn btn-primary">Upload Image</button>
</div>
<div id="results" class="mt-8 hidden">
<h2 class="text-2xl font-semibold mb-4">Results</h2>
<div class="grid grid-cols-1 md:grid-cols-2 gap-4">
<div>
<h3 class="text-xl font-semibold mb-2">Uploaded Image</h3>
<img id="uploadedImage" class="w-full object-contain rounded-lg shadow-lg" style="max-height: 400px;" />
<div id="imageDescription" class="mt-2 p-4 bg-base-200 rounded-lg"></div>
</div>
<div>
<h3 class="text-xl font-semibold mb-2">Similar Images</h3>
<div id="similarImages" class="grid grid-cols-2 gap-2"></div>
</div>
</div>
<div class="mt-8">
<h3 class="text-xl font-semibold mb-2">Fashion Analysis</h3>
<div id="analysis" class="p-4 bg-base-200 rounded-lg"></div>
</div>
</div>
</div>
<script>
const uploadButton = document.getElementById('uploadButton');
const imageUpload = document.getElementById('imageUpload');
const categorySelect = document.getElementById('categorySelect');
const results = document.getElementById('results');
const uploadedImage = document.getElementById('uploadedImage');
const imageDescription = document.getElementById('imageDescription');
const similarImages = document.getElementById('similarImages');
const analysis = document.getElementById('analysis');
uploadButton.addEventListener('click', async () => {
const file = imageUpload.files[0];
const category = categorySelect.value;
if (!file || category === "Select a category") return;
const formData = new FormData();
formData.append('file', file);
formData.append('category', category);
const response = await fetch('/upload', {
method: 'POST',
body: formData
});
const data = await response.json();
uploadedImage.src = data.uploaded_image;
imageDescription.textContent = data.image_description;
similarImages.innerHTML = '';
data.similar_images.forEach(image => {
const imgContainer = document.createElement('div');
imgContainer.className = 'relative';
const img = document.createElement('img');
img.src = `/images/${image.category}/${image.filename}`;
img.className = 'w-full object-contain rounded shadow';
img.style.maxHeight = '200px';
const filename = document.createElement('p');
filename.textContent = `${image.filename} (${image.category})`;
filename.className = 'text-xs mt-1 text-center';
imgContainer.appendChild(img);
imgContainer.appendChild(filename);
similarImages.appendChild(imgContainer);
});
// Convert markdown to HTML and set it as innerHTML
analysis.innerHTML = marked.parse(data.analysis);
results.classList.remove('hidden');
});
</script>
</body>
</html>
Fashion Buddy Application
Fashion Item Analysis
The fashion item in the uploaded image is a gray sleeveless suit with a V-neck and black buttons, consisting of a matching blazer with shoulder pads and straight-leg pants. The set is completed with a pair of chunky black heels and a large white bag.
Similar Image Analysis
1. 7969241800_1_1_1.jpg (Category: Female)
Colour Palette: The colours of the fashion item are black, white, and a dark brown accent.
Style comparison: The style of the fashion item is more formal and elegant compared to the uploaded image.
Outfit combination suggestion: This fashion item can be paired with a white shirt and a black skirt for a formal evening event, or with a black tank top and a pair of distressed jeans for a casual night out.
Brief image description: The image features a woman modeling a black sleeveless evening gown with a high neckline and a dark brown belt.
2. 7980117632_1_1_1.jpg (Category: Female)
Colour Palette: The colours of the fashion item are navy blue, white, and a light gray accent.
Style comparison: The style of the fashion item is more casual and relaxed compared to the uploaded image.
Outfit combination suggestion: This fashion item can be paired with a white tank top and a pair of light gray shorts for a summer day out, or with a navy blue sweater and a pair of black jeans for a casual dinner date.
Brief image description: The image features a woman modeling a navy blue sleeveless shirt with a crew neck and a light gray jacket.
3. 7595816508_1_1_1.jpg (Category: Female)
Colour Palette: The colours of the fashion item are red, black, and a silver accent.
Style comparison: The style of the fashion item is more edgy and bold compared to the uploaded image.
Outfit combination suggestion: This fashion item can be paired with a black tank top and a pair of high-waisted black pants for a rock concert, or with a red sweater and a pair of distressed black jeans for a casual night out.
Brief image description: The image features a woman modeling a red sleeveless jacket with a V-neck and a silver buckle.
4. 8686456600_1_1_1.jpg (Category: Female)
Colour Palette: The colours of the fashion item are beige, brown, and a tan accent.
Style comparison: The style of the fashion item is more bohemian and earthy compared to the uploaded image.
Outfit combination suggestion: This fashion item can be paired with a beige tank top and a pair of distressed brown jeans for a casual day out, or with a brown sweater and a pair of beige culottes for a relaxed brunch.
Brief image description: The image features a woman modeling a beige sleeveless top with a V-neck and a tan belt.
5. 8067505044_1_1_1.jpg (Category: Female)
Colour Palette: The colours of the fashion item are green, black, and a gold accent.
Style comparison: The style of the fashion item is more formal and sophisticated compared to the uploaded image.
Outfit combination suggestion: This fashion item can be paired with a black tank top and a pair of high-waisted black pants for a formal event, or with a green sweater and a pair of black jeans for a formal dinner date.
Brief image description: The image features a woman modeling a green sleeveless jacket with a crew neck and a gold buckle.
Fashion Analysis
Fashion Item Description The black jeans in the uploaded image appear to be a high-quality, well-made article of clothing that would be suitable for a variety of occasions. They have a classic design that is both stylish and comfortable, making them a great choice for anyone looking for a versatile and reliable pair of pants.
Similar Items Fashion Analysis
1. 6917303500_1_1_1.jpg (Category: Male)
Colour Palette: The fashion item in this image is a dark grey, almost black, colour.
Style comparison: The style of the fashion item in this image is similar to the black jeans, but the fit is slightly tighter and the material appears to be a lighter weight.
Outfit combination suggestion: This fashion item can be paired with a white shirt and a leather jacket for a casual look, or with a button-down shirt and loafers for a more formal look.
Brief image description: The fashion item is a pair of dark grey slim-fit trousers with a slight stretch and a sleek finish.
2. 8574379704_1_1_1.jpg (Category: Male)
Colour Palette: The fashion item in this image is a light blue, denim colour.
Style comparison: The style of the fashion item in this image is different from the black jeans, with a more relaxed fit and a rugged texture.
Outfit combination suggestion: This fashion item can be paired with a white t-shirt and a denim jacket for a casual look, or with a button-down shirt and sneakers for a more relaxed look.
Brief image description: The fashion item is a pair of light blue, denim jeans with a relaxed fit and a rugged texture.
3. 4644410400_1_1_1.jpg (Category: Male)
Colour Palette: The fashion item in this image is a dark brown, leather-like colour.
Style comparison: The style of the fashion item in this image is different from the black jeans, with a more formal, dressy tone.
Outfit combination suggestion: This fashion item can be paired with a white shirt and a tie for a formal look, or with a button-down shirt and boots for a more formal casual look.
Brief image description: The fashion item is a pair of dark brown, leather-like trousers with a sleek finish and a sophisticated design.
4. 5841253800_1_1_1.jpg (Category: Male)
Colour Palette: The fashion item in this image is a light grey, almost beige, colour.
Style comparison: The style of the fashion item in this image is similar to the black jeans, but the material appears to be a more textured and more casual.
Outfit combination suggestion: This fashion item can be paired with a white shirt and a denim jacket for a casual look, or with a button-down shirt and sneakers for a more relaxed look.
Brief image description: The fashion item is a pair of light grey, casual trousers with a relaxed fit and a textured finish.
5. 6085385250_1_1_1.jpg (Category: Male)
Colour Palette: The fashion item in this image is a dark, almost black, grey colour.
Style comparison: The style of the fashion item in this image is similar to the black jeans, but the fit is slightly tighter and the material appears to be a more premium quality.
Outfit combination suggestion: This fashion item can be paired with a white shirt and a leather jacket for a casual look, or with a button-down shirt and loafers for a more formal look.
Brief image description: The fashion item is a pair of dark grey, slim-fit trousers with a sleek finish and a premium quality feel.
Future Scope of Fashion Buddy
- Enhanced Image Processing:
- Multiple Image Uploads: Allow users to upload multiple images at once for batch processing and analysis.
- Image Editing Tools: Integrate basic image editing features (e.g., cropping, resizing) before uploading to enhance user control over the input images.
2. Advanced Fashion Analysis:
- Style Recommendations: Implement a recommendation engine that suggests outfits based on user preferences, current trends, and similar items in the database.
- Trend Analysis: Use historical data to analyze fashion trends over time and provide insights into emerging styles.
3. User Personalization:
- User Profiles: Allow users to create profiles to save their favorite items, previous uploads, and personalized recommendations.
- Feedback Mechanism: Implement a feedback system where users can rate the accuracy of the fashion analysis, helping to improve the model over time.
4. Model Improvements:
- Model Fine-Tuning: Continuously fine-tune the CLIP and Groq models with new data to improve accuracy and relevance in fashion analysis.
- Additional Models: Explore and integrate other machine learning models for specific tasks, such as color detection, fabric recognition, or style classification.
5. Community Features:
- User Contributions: Allow users to contribute their own fashion items and analyses, creating a community-driven database.
- Social Sharing: Implement features for users to share their analyses and favorite items on social media platforms.
6. Data Privacy and Security:
- User Data Protection: Ensure that user data is handled securely and in compliance with data protection regulations (e.g., GDPR).
- Anonymization: Implement data anonymization techniques to protect user identities while still allowing for data analysis.
7. Dverse data. As experimentaiion purpose I have only use 3 categories limited to 40 images only.We can use more variety of data and upload it to the vector store.
Conclusion
Fashion Buddy aims to enhance the shopping and fashion exploration experience by providing users with intelligent insights and recommendations based on visual data. By combining state-of-the-art deep learning learning techniques, Gen AI framework with a user-friendly interface, Fashion Buddy serves as a valuable tool for fashion enthusiasts and shoppers alike.
References
CLIP Model
Radford, A., Wu, J., Child, R., Luan, D., Amodei, D., & Sutskever, I. (2021). “Learning Transferable Visual Models From Natural Language Supervision.” In Proceedings of the 38th International Conference on Machine Learning (ICML 2021).
Groq
Groq is a company that specializes in machine learning hardware and software. Their models and APIs are designed to optimize performance for AI workloads.
LangChain
LangChain is a framework designed to facilitate the development of applications that use language models. It provides tools for chaining together different components, such as prompts, models, and data sources.
Llama3.2–11b Vision Model
The Llama model series is developed by Meta AI (formerly Facebook AI Research) and is designed for various natural language processing tasks. The vision model is specifically tailored for image understanding tasks.