Bridging the chasm between massive robotics data and computer vision workflows

Yaser Khalighi
SceneBox
Published in
5 min readMay 23, 2022

The world is going autonomous, and robotics teams are at the forefront of the autonomy revolution. When developing the robots of tomorrow, the teams behind these magical feats of engineering need to build computer vision models. Without “eyes”, truly autonomous robots cannot keep up with the dynamic and, at times, chaotic human world.

The problem is, that there is no easy way for robotics engineers to manage their robotics data for computer vision training workflows. For example, there is no way to input a rosbag (a recording file from ROS library) and easily curate, manage, search, and triage data as required for the development of computer vision models.

In this article, we explore a specific type of robotics data — namely, rosbags. If you are working with robots, it’s likely that you’ve heard of rosbags, or even use them to record and store your robotics data. So, the question is, how can you review, search, and analyze the data in your rosbags for computer vision (CV) / machine learning (ML) workflows?

Challenge: Using rosbags for CV workflows

Before we dive into the solution, let’s explore some of the challenges. There are many obstacles teams bump into when moving to CV development with robotics data. Below is a non-exhaustive list of the pains we hear called out most often by robotics teams working on CV:

  • rosbags are typically heavy (TBs each). As such, you don’t want to move them around
  • There’s typically no good way to search within rosbags en masse — rather, one must manually review data once a rosbag is downloaded and opened up for exploration
  • There isn’t a cloud-based way to visualize, search data, and review statistics about the data
  • There isn’t a unified way of orchestrating all the data operations going from rosbags, to ML training, to model evaluation

The list goes on. To share some more experience, here are a couple of unedited workflow horror stories we have heard from world-class teams:

Workflow 1:

To find the relevant data in rosbags for computer vision training, we have to review multiple rosbags based on their file and folder names, then pick a few to comb through in further detail. Download them (~approx 10 TB of data) and manually review all of the data with a local ROS visualizer. In the manual review we jot a few time intervals down in an excel spreadsheet and then run another ros script along with an FFmpeg script to extract and format 500 frames from these intervals. We package the frames in a zip file and send them to our annotation partner with an email and a cloud file-share. We receive the annotated data in JSON format in another email. We then use a home-grown script to parse the data and we then visualize and triage the annotations alongside the original data using another python script.

Time required: Depends, 4–5 hours on average

Workflow 2:

“Our self-driving cars records a ton of ROS data. After we upload them to S3, we want to visualize/search the data and find this “needle in the haystack” type data and send it to our labeling team. This entire process is manual and done on local machines with a growing number of in-house scripts, requiring us to download rosbags and comb through it to find and extract what we are looking for. This takes anywhere from 2 to 8 hours depending on what we are after. Management of labeled data alongside the raw ROS data is another big issue as rosbags, labels, and extracted frames live in separate environments.”

These obstacles to development haunt us too, which is why we have been hard at work on these problems for the last 3 years. Below we outline how we (SceneBox) to bridge the gap between robotics and ML computer vision.

Solution: A computer vision data engine that natively supports ROS

SceneBox supports ROS natively (among many other formats like KITTI, RTMaps, MDF4, etc) — and with these problems in mind, I couldn’t be more excited to show you SceneBox’s powerful tools for uploading, viewing, and analyzing rosbags. Once indexed by SceneBox (no upload required), all the data within rosbags are searchable and can be used for data curation, sent to the growing of labelers with which SceneBox integrates with, and triage labeled data.

Index your rosbag with a Single Python Call:

SceneBox can either upload your rosbags to the cloud of your choice or overlay on top of your existing data in your data lake to index it. We’ve streamlined the process of indexing rosbags into SceneBox for your ease. Simply call index_rosbag() from our Python package:

During import, SceneBox can perform deep analysis where it automatically extracts, sorts, and enriches rosbag messages. In particular, it can take a few models (here mask_rcnn model trained on COCO) to detect objects and index them. Once finished, various tools are available to explore and curate your data further. That includes reviewing time-based data, viewing embeddings, dashboard analytics, and multi-modal data search.

Review data in time space or embedding space

In session view, we can simultaneously playback a collection of rosbag events projected onto a single timeline (just like how it was recorded). Play/rewind the session, extract images or video clips, or add comments to all assets present at a specific timestamp:

SceneBox automatically generates embeddings and applies Uniform Manifold Approximation and Projection onto your rosbag images to visually show your data’s distribution. From this view, you can inspect any asset in detail, or curate a new dataset:

Data search and analytics

Dashboards are a great way to discover trends in your data. Easily add and rearrange pie charts, bar charts, or tables to aggregate over virtually any metadata field. Finally, all rosbag images and videos (along with their extracted annotations, embeddings, and other metadata) are available for view on the data discovery page. Search and filter, curate new datasets, and inspect individual assets in detail in this view:

The curated data from the rosbags can be used in SceneBox’s Data Operations. These operations include well-established workflows such as sending the datasets to labeling platforms, similarity search, data deduplication, label triage, and model debugging.

Check it out:

We’ve uploaded a rosbag created from one of the samples from the KITTI dataset for you to explore here. Feel free to contact us if you have any questions!

Thanks to SceneBox’s super-engineer Emily Lee for the content on KITTI.

I would love to hear your thoughts on this article. Our team is always open to conversations. Please reach out to me on LinkedIn if you would like to take a deeper dive with me.

--

--