Image Classification with Object Detection: A 5-Minute Guide
We’ll walk through building a simple image classification and object detection pipeline using the Clarifai API and Python. We’ll leverage the power of cloud-based AI to identify objects in images and visualize the results with bounding boxes. This project is perfect for beginners eager to dive into the world of computer vision.
What We’ll Build
We’ll create a Python script that:
- Downloads an image.
- Uses the Clarifai API to detect objects in the image.
- Draws bounding boxes around the detected objects, labeling them with their names and confidence scores.
Prerequisites
- Python 3.6+: Ensure you have Python 3.6 or a later version installed.
- Required Libraries: Install the necessary libraries:
pip install tensorflow Pillow requests
- tensorflow: For image handling (using Pillow/PIL).
- Pillow: Python Imaging Library for image manipulation.
- requests: For making HTTP requests to the Clarifai API.
Clarifai Account and API Key:
- Sign up for a free Clarifai account at https://www.clarifai.com/
- Create an application within your Clarifai account to obtain your API key.
The Code
Here’s the core Python script:
import tensorflow as tf
from PIL import Image, ImageDraw
import requests
import json
import io
import os
import base64
CLARIFAI_API_KEY = "YOUR_CLARIFAI_API_KEY" # Replace with your Clarifai API key
CLARIFAI_MODEL_ID = "general-image-recognition"
Important: Remember to replace “YOUR_CLARIFAI_API_KEY” with your actual Clarifai API key.
Classifying the Image with Clarifai
def classify_image_clarifai(image_path, api_key=CLARIFAI_API_KEY, model_id=CLARIFAI_MODEL_ID):
"""Classifies objects in an image using the Clarifai API."""
with open(image_path, "rb") as f:
image_bytes = f.read()
base64_image = base64.b64encode(image_bytes).decode("utf-8")
url = f"https://api.clarifai.com/v2/models/{model_id}/outputs"
headers = {
"Authorization": f"Key {api_key}",
"Content-Type": "application/json"
}
payload = {
"inputs": [
{
"data": {
"image": {
"base64": base64_image
}
}
}
]
}
try:
response = requests.post(url, headers=headers, data=json.dumps(payload))
response.raise_for_status() # Raise HTTPError for bad responses
results = response.json()
predictions = []
for concept in results['outputs'][0]['data']['concepts']:
prediction = {
'label': concept['name'],
'confidence': concept['value']
}
predictions.append(prediction)
return predictions
except requests.exceptions.RequestException as e:
print(f"Error during API request: {e}")
return []
except (KeyError, ValueError) as e:
print(f"Error parsing API response: {e}")
return []
except Exception as e:
print(f"An unexpected error occurred: {e}")
return []
This function reads the image data, encodes it to base64, and sends it to the Clarifai API endpoint. The response is parsed to extract the predicted labels and confidence scores.
Object Detection and Bounding Boxes
def classify_image_clarifai_with_bounding_boxes(image_path, api_key=CLARIFAI_API_KEY, model_id="general-image-detection"):
"""Classifies objects and gets bounding boxes in an image using the Clarifai API."""
with open(image_path, "rb") as f:
image_bytes = f.read()
base64_image = base64.b64encode(image_bytes).decode("utf-8")
url = f"https://api.clarifai.com/v2/models/{model_id}/outputs"
headers = {
"Authorization": f"Key {api_key}",
"Content-Type": "application/json"
}
payload = {
"inputs": [
{
"data": {
"image": {
"base64": base64_image
}
}
}
]
}
try:
response = requests.post(url, headers=headers, data=json.dumps(payload))
response.raise_for_status() # Raise HTTPError for bad responses
results = response.json()
predictions = []
for region in results['outputs'][0]['data']['regions']:
concept = region['data']['concepts'][0] # Get the top concept
prediction = {
'label': concept['name'],
'confidence': concept['value'],
'bounding_box': region['region_info']['bounding_box']
}
predictions.append(prediction)
return predictions
except requests.exceptions.RequestException as e:
print(f"Error during API request: {e}")
return []
except (KeyError, ValueError) as e:
print(f"Error parsing API response: {e}")
return []
except Exception as e:
print(f"An unexpected error occurred: {e}")
return []
This function is very similar to the classification function, but it uses the Clarifai’s general-image-detection model to retrieve bounding box information for each detected object.
Drawing Bounding Boxes on the Image
def draw_bounding_boxes(image_path, predictions, output_path="labeled_image_with_boxes.jpg"):
"""Draws bounding boxes on an image based on Clarifai predictions."""
try:
image = Image.open(image_path)
draw = ImageDraw.Draw(image)
width, height = image.size
for prediction in predictions:
box = prediction['bounding_box']
label = prediction['label']
confidence = prediction['confidence']
left = box['left_col'] * width
top = box['top_row'] * height
right = box['right_col'] * width
bottom = box['bottom_row'] * height
draw.rectangle(((left, top), (right, bottom)), outline="red", width=3)
draw.text((left, top - 10), f"{label}: {confidence:.2f}", fill="red")
image.save(output_path)
print(f"Labeled image saved to {output_path}")
except FileNotFoundError:
print(f"Error: Image file not found at {image_path}")
except Exception as e:
print(f"Error drawing bounding boxes: {e}")
This function uses the Pillow library to open the image and draw rectangles based on the bounding box coordinates provided by Clarifai. It also adds labels with the object name and confidence score.
Putting it All Together
if __name__ == "__main__":
import base64 # Import base64 here
# Download a test image (or use your own)
image_url = "https://samples.clarifai.com/metro-north.jpg" # Example image with objects
image_path = "test_image.jpg"
try:
response = requests.get(image_url, stream=True)
response.raise_for_status() # Raise HTTPError for bad responses
with open(image_path, 'wb') as out_file:
for block in response.iter_content(1024):
out_file.write(block)
print(f"Downloaded test image to {image_path}")
except requests.exceptions.RequestException as e:
print(f"Error downloading image: {e}")
exit()
# Classify the image using Clarifai (General Recognition Model)
print("\nClassifying with general recognition model:")
predictions = classify_image_clarifai(image_path)
if predictions:
for p in predictions:
print(f" {p['label']}: {p['confidence']:.4f}")
else:
print(" No predictions received.")
# Classify the image with bounding boxes (Object Detection Model)
print("\nClassifying with object detection model and drawing bounding boxes:")
predictions_with_boxes = classify_image_clarifai_with_bounding_boxes(image_path)
if predictions_with_boxes:
for p in predictions_with_boxes:
print(f" {p['label']}: {p['confidence']:.4f}, Bounding Box: {p['bounding_box']}")
draw_bounding_boxes(image_path, predictions_with_boxes, "labeled_image_with_boxes.jpg")
else:
print(" No predictions with bounding boxes received.")
This section downloads a test image, calls the classification and object detection functions, and then saves the labeled image.
Running the Script
- Save the code as image_classifier.py.
- Run the script from your terminal: python image_classifier.py
After running, you’ll find a labeled_image_with_boxes.jpg file with the detected objects highlighted.
Conclusion
This project provides a basic framework for image classification and object detection. You can extend this project by:
- Experimenting with different Clarifai models.
- Adjusting the confidence threshold for predictions.
- Building a user interface to upload and process images.
- Deploying your application to a cloud platform for wider accessibility.