Sitemap
Voxel51

News, tutorials, tips, and big ideas in computer vision and data-centric machine learning, from the company behind open source FiftyOne. Learn more at https://voxel51.com

WebUOT-1M: A Dataset for Underwater Object Tracking

11 min readFeb 17, 2025

--

Example of video sequences and annotations of the WebUOT-1M dataset. Source: Figure 2 of the WebUOT-1M paper.

The specific purposes of the WebUOT-1M dataset include:

Here’s a tl;dr of what this dataset comprises:

Exploring WebUOT-1M in FiftyOne

Table 8 from the WebUOT-1M paper describing the 23 attributes for each sample in the dataset

However, this doesn’t take away from the fact that the bounding boxes are very high quality, and the dataset is quite robust. I hope to coordinate with the author to obtain an official attribute mapping.

!pip install fiftyone umap-learn timm hiera-transformer einops
import fiftyone as fo
from fiftyone.utils.huggingface import load_from_hub

dataset = load_from_hub(
"Voxel51/WebUOT-238-Test",
name="webuot238",
overwrite=True,
)
fo.launch_app(dataset)
fiftyone app launch
fo.launch_app(dataset)
Initial Exploration of WebUOT

Exploring deeper

!fiftyone plugins download https://github.com/harpreetsahota204/hiera-video-embeddings-plugin
!fiftyone plugins requirements @harpreetsahota/hiera_video_embeddings --install
import os

os.environ['FIFTYONE_ALLOW_LEGACY_ORCHESTRATORS'] = 'true'
import fiftyone.operators as foo

hiera_embeddings = foo.get_operator("@harpreetsahota/hiera_video_embeddings/compute_hiera_video_embeddings")
from fiftyone import ViewField as F

short_videos = dataset.filter_labels(
"Length", F("label").is_in(["short"])
).clone(name="short_videos")
len(short_videos)
fiftyone delegated launch.

await hiera_embeddings(
short_videos,
model_name="hiera_base_plus_16x224",
checkpoint="mae_k400", #one of mae_k400 OR mae_k400_ft_k400
embedding_types="terminal", #or hierarchical
emb_field="hiera_video_embeddings",
normalize=True, #defaults to False, only works with `terminal` embeddings
delegate=True
)
short_videos.persistent = True
short_videos.reload()

import torch

from transformers import AutoModel

jina_embeddings_model = AutoModel.from_pretrained(
"jinaai/jina-embeddings-v3",
trust_remote_code=True,
device_map = "cuda" if torch.cuda.is_available() else "cpu"
)
for sample in short_videos.iter_samples(autosave=True):
text_embeddings = jina_embeddings_model.encode(
sentences = [sample["language"]], # model expects a list of strings
task="separation"
)
sample["text_embeddings"] = text_embeddings.squeeze()
import fiftyone.brain as fob

embedding_fields = [ "hiera_video_embeddings", "text_embeddings"]

for fields in embedding_fields:
_fname = fields.split("_embeddings")[0]
results = fob.compute_visualization(
short_videos,
embeddings=fields,
method="umap",
brain_key=f"{_fname}_viz",
num_dims=2,
)
Visualizing video and text embeddings for WebUOT in FiftyOne
!pip install "git+https://github.com/facebookresearch/sam2.git#egg=sam-2"
import torch 
import fiftyone.zoo as foz

sam_model = foz.load_zoo_model(
"segment-anything-2-hiera-tiny-video-torch",
device="cuda" if torch.cuda.is_available() else "cpu"
)
short_videos.apply_model(
sam_model,
label_field="sam_segmentations",
prompt_field="frames.gt", # Can be a detections or a keypoint field
)
Exploring SAM2 predictions in the FiftyOne app
short_videos.evaluate_detections(
pred_field="frames.sam_segmentations",
gt_field="frames.gt",
eval_key="sam_eval",
iou=0.7
)
Model evaluation panel in the FiftyOne app

An important consideration

Conclusion

--

--

Voxel51
Voxel51

Published in Voxel51

News, tutorials, tips, and big ideas in computer vision and data-centric machine learning, from the company behind open source FiftyOne. Learn more at https://voxel51.com

Harpreet Sahota
Harpreet Sahota

Written by Harpreet Sahota

🤖 Generative AI Hacker | 👨🏽‍💻 AI Engineer | Hacker-in- Residence at Voxel 51

No responses yet