Sitemap
Voxel51

News, tutorials, tips, and big ideas in computer vision and data-centric machine learning, from the company behind open source FiftyOne. Learn more at https://voxel51.com

CVPR Survival Guide: Discovering Research That’s Interesting to YOU!

8 min readJun 19, 2024

--

Source: CVPR Website

🧐 What’s in this dataset?

Photo by fabio on Unsplash
%%capture
!pip install fiftyone sentence-transformers umap-learn lancedb scikit-learn==1.4.2
!fiftyone plugins download https://github.com/jacobmarks/clustering-plugin
import fiftyone as fo
import fiftyone.utils.huggingface as fouh
dataset = fouh.load_from_hub("Voxel51/CVPR_2024_Papers")
session = fo.launch_app(dataset, auto=False)
session.show()

Take a look at the app below

%%capture
from sentence_transformers import SentenceTransformer

model = SentenceTransformer(
'Alibaba-NLP/gte-large-en-v1.5',
trust_remote_code=True
)
def get_text_embeddings(dataset, field, model):
"""
Returns the embeddings of the abstracts in the dataset.

Args:
dataset: A FiftyOne dataset object.

Returns:
A list of embeddings.
"""
texts = dataset.values(field)
text_embeddings = []
for text in texts:
embeddings = model.encode(text)
text_embeddings.append(embeddings)
return text_embeddings
def add_embeddings_to_dataset(dataset, field, embeddings):
"""
Adds the embeddings to the dataset.

Args:
dataset: A FiftyOne dataset object.
embeddings: A list of embeddings.
"""
dataset.add_sample_field(field, fo.VectorField)
dataset.set_values(field, embeddings)
abstract_embeddings = get_text_embeddings(
dataset = dataset,
field = "abstract",
model = model
)

add_embeddings_to_dataset(
dataset=dataset,
field="abstract_embeddings",
embeddings=abstract_embeddings
)

title_embeddings = get_text_embeddings(
dataset = dataset,
field = "title",
model = model
)

add_embeddings_to_dataset(
dataset=dataset,
field="title_embeddings",
embeddings=title_embeddings
)

Making use of the embeddings

Visualizing embeddings

UMAP (Uniform Manifold Approximation and Projection)

t-SNE (t-distributed Stochastic Neighbor Embedding)

PCA (Principal Component Analysis)

Manual

import fiftyone.brain as fob

fob.compute_visualization(
dataset,
embeddings="abstract_embeddings",
num_dims=2,
method="umap",
brain_key="umap_abstract",
verbose=True,
seed=51
)

fob.compute_visualization(
dataset,
embeddings="title_embeddings",
num_dims=2,
method="umap",
brain_key="umap_title",
verbose=True,
seed=51
)

Computing uniqueness

fob.compute_uniqueness(
dataset,
embeddings="abstract_embeddings",
uniqueness_field="uniqueness_abstract",
)

fob.compute_uniqueness(
dataset,
embeddings="title_embeddings",
uniqueness_field="uniqueness_title",
)

Computing similarity

sim_abstract = fob.compute_similarity(
dataset,
embeddings="abstract_embeddings",
brain_key="abstract_similarity",
backend="lancedb",
)

Now, let’s check all this out in the app!

Thanks for reading!

--

--

Voxel51
Voxel51

Published in Voxel51

News, tutorials, tips, and big ideas in computer vision and data-centric machine learning, from the company behind open source FiftyOne. Learn more at https://voxel51.com

Harpreet Sahota
Harpreet Sahota

Written by Harpreet Sahota

🤖 Generative AI Hacker | 👨🏽‍💻 AI Engineer | Hacker-in- Residence at Voxel 51

No responses yet