Image segmentation is one of the fundamental steps toward scene understanding of machines. By doing image segmentation, machines transit from the abstract image categorization toward more grounded pixel-level classification, in which each pixel is labeled by considering its local neighborhood, image context, scene composition as well as available low-level (pixel relations) and high-level knowledge (e.g., ontologies, object relations). There are numerous applications for image segmentation, including medical image analysis (e.g., chest X-ray inflammation segmentation), autonomous vehicles, video surveillance, and augmented reality.

Image segmentation is designed to work with many inputs such as 2D segmentation that mostly deals with images, 3D…


Here we show visual slam for monocular videos and how a consistent map is obtained via loop closure mechanism.

For self-driving cars to localize themselves in a global environment, accurate geometric maps are needed. Self-driving cars need to know in which lane they are driving but also how far away from the lane boundary they are. The accuracy needs to be on the order of 5 cm.

Structure from Motion (SfM) and Simultaneous Localization and Mapping (SLAM)-based methods have been known for years for extracting a 3D environment from a series of images. However, these algorithms perform well on local…


How to convert a RGBD image to points in 3D space

This tutorial introduces the intrinsic matrix and walks you through how you can use it to convert an RGBD (red, blue, green, depth) image to 3D space. RGBD images can be obtained in many ways. E.g. from a system like Kinect that uses infrared-based time-of flight detection. But also the iPhone 12 is rumored to integrate a LiDAR in its camera system. Most importantly for self-driving cars: LiDAR data from a mobile unit on a car can be combined with a standard RGB camera to obtain the RGBD data…


Self-supervised learning is all the rage these days for machine learning. Whether its models like GPT-3 for natural language processing or data augmentation for computer vision, everyone is trying to get something for free from their data without paying for expensive things like labels.

In this article, we’ll go over one area of computer vision research focused on the very same goal. This area has gone relatively unnoticed by the broader machine learning community. …

yodayoda

A Map for Robots and a programmable world

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store