Sitemap
Voxel51

News, tutorials, tips, and big ideas in computer vision and data-centric machine learning, from the company behind open source FiftyOne. Learn more at https://voxel51.com

AIMv2 Outperforms CLIP on Synthetic Dataset ImageNet-D

11 min readFeb 12, 2025

--

Exploring ImageNet-D in FiftyOne

I wrote an in-depth blog about the ImageNet-D dataset, which you can read here.

What we’re doing in this tutorial

Preliminaries

!pip install fiftyone umap-learn
import os
import fiftyone as fo
import fiftyone.utils.huggingface as fouh
os.environ['FIFTYONE_ALLOW_LEGACY_ORCHESTRATORS'] = 'true'

dataset = fouh.load_from_hub(
"Voxel51/ImageNet-D",
name="imagenet_d"
)
!fiftyone plugins download \
https://github.com/voxel51/fiftyone-plugins \
--plugin-names @voxel51/dashboard
fo.launch_app(dataset)
Exploring ImageNet-D in FiftyOne
gt_labels = dataset.distinct("ground_truth.label")

What is AIMv2?

I’ve written an in-depth blog about AIMv2, which you can read here.

How AIMv2 differs from CLIP

Core differences between AIMv2 and CLIP

Using AIMv2 in FiftyOne

Feature Extraction and Embedding Visualization in FiftyOne

!fiftyone plugins download https://github.com/harpreetsahota204/aim-embeddings-plugin
import fiftyone.operators as foo

aim_embeddings = foo.get_operator("@harpreetsahota/aimv2_embeddings/compute_aimv2_embeddings")

Run the operator on your dataset

embedding_types = ['cls', 'mean']

for emb_type in embedding_types:
await aim_embeddings(
dataset,
model_name="apple/aimv2-large-patch14-224",
embedding_types=emb_type,
emb_field=f"aimv2_{emb_type}_emb",
delegate=True
)
import torch 

import fiftyone.zoo as foz

clip_model = foz.load_zoo_model(
"clip-vit-base32-torch",
text_prompt="A photo of a",
classes=gt_labels,
device="cuda" if torch.cuda.is_available() else "cpu"
)
dataset.compute_embeddings(
model=clip_model,
embeddings_field="clip_emb"
)

Visualizing embeddings

import fiftyone.brain as fob

embedding_fields = ["aimv2_cls_emb", "aimv2_mean_emb", "clip_emb"]

for embeddings in embedding_fields:
results = fob.compute_visualization(
dataset,
embeddings=embeddings,
method="umap",
brain_key=f"{embeddings}_viz",
num_dims=2,
n_neighbors=10,
min_dist=0.051,
verbose=True,
)
fo.launch_app(dataset)
Exploring AIMv2 vs CLIP embeddings in FiftyOne

Zero-Shot Classification in FiftyOne

!fiftyone plugins download https://github.com/jacobmarks/zero-shot-prediction-plugin
import fiftyone.operators as foo

zsc = foo.get_operator("@jacobmarks/zero_shot_prediction/zero_shot_classify")
await zsc(
dataset,
labels=gt_labels,
model_name="AIMv2",
label_field="AIMv2_predictions",
delegate=True
)
dataset.apply_model(
model=clip_model,
label_field="clip_predictions",
store_logits=True
)

# Save the additions we've made to the database
dataset.save()

Model evaluation in FiftyOne

zsc_preds = ["AIMv2_predictions", "clip_predictions"]

for pred in zsc_preds:
__key = pred.split("_")[0]
dataset.evaluate_classifications(
pred_field=pred,
gt_field="ground_truth",
method="simple",
eval_key=f"{__key}_simple_eval",
)
Using the model evaluation panel to analyze model performance individually
Using the model evaluation panel to compare model performance
aim_eval_results = dataset.load_evaluation_results("AIMv2_simple_eval")

clip_eval_results = dataset.load_evaluation_results("clip_simple_eval")

A Brief Refresher

aim_eval_results.print_report()

However, what stands out is the performance of AIMv2, which has a top-line accuracy of 41.92 and is just mopping the floor with CLIP across the other metrics!

#AIMv2 Results
aim_eval_results.print_metrics(average='weighted', digits=4) # you can also pass in "micro" or "macro"
accuracy   0.4192
precision 0.5996
recall 0.4192
fscore 0.451
support 4835
#CLIP results
clip_eval_results.print_metrics(average='weighted', digits=4)
accuracy   0.2507
precision 0.4637
recall 0.2507
fscore 0.2856
support 4835

Finding the Hardest Samples

import fiftyone.brain as fob

zsc_preds = ["AIMv2_predictions", "clip_predictions"]

for pred in zsc_preds:
fob.compute_hardness(dataset, pred)

fo.launch_app(dataset)
Exploring sample hardness in FiftyOne

Next Steps

--

--

Voxel51
Voxel51

Published in Voxel51

News, tutorials, tips, and big ideas in computer vision and data-centric machine learning, from the company behind open source FiftyOne. Learn more at https://voxel51.com

Harpreet Sahota
Harpreet Sahota

Written by Harpreet Sahota

🤖 Generative AI Hacker | 👨🏽‍💻 AI Engineer | Hacker-in- Residence at Voxel 51

No responses yet