How do machines see our world?

Published in

FACE | Amrita Bangalore

5 min readAug 25, 2019

Have you ever wondered how Curiosity- The Mars Rover navigates through the Martian terrain? Or how Tesla’s new automatic cars work? First, we need to understand how robots see our world! Well, that’s relatively simple, we could use a video camera to stream a bunch of continuous images. These images are processed by our robot. The first step is noise reduction and improving contrast. The next feature recognition to identify various geometric features like lines, curves, edges, etc. these features essentially reduces the total data from millions of pixels and makes it easier to process the image further.

These features within an image allow us to:

Detect objects: Determining where the object is present in the image or determining the spatial orientation of this object. This is used in applications like snapchat to detect your face.

Classify objects: Questions such as “Is this object a cat or a dog?” or “Does this image have a cat in it?”. This is used in applications like google lens that can identify several species of animals, plants, flowers, etc.

Recognize attributes: Defining the visual properties or qualities of objects i.e. choose adjectives that describe the object e.g. “Is scene dry or wet?” This is used by various applications like face unlock and face recognition or in applications that can read license plates regardless of the font used.

Segmentation-classify objects: Determining which to object a pixel in the image belongs to. This outlines the objects’ profiles and annotates every area separately. etc. This is used in several applications that automatically remove the background from a portrait image.

These features help the robot decipher each image and categorize different features that are related to different objects. The robot tries to match the features that it finds with a library of features that the software is looking for, using these methods, robots can identify several objects with ease. They can recognize and categorize thousands of species of cats for instance. However, sometimes predefined algorithms can fail to correctly identify objects

Let’s take an example,

How would you distinguish between the chihuahua and a muffin?

We as humans can tell that the left one is a dog and the image on the right is that of a muffin. It’s quite easy for a machine to misidentify them. They look similar and it’s difficult to program an application that can flawlessly distinguish between them.

NOW, which one of these is a real dog??

Again, with close observation, we can tell that number 7 is a real dog. However, can a machine figure this out? Not with a single, unchanging algorithm!

This is where machine learning and deep learning comes into the picture! The systems are not programmed to see but instead, learn how to see. It’s like creating intelligence for the robot and let it think for itself. Machine learning enables robots to learn by themselves and improve from their experience. In simple terms, we don’t have to explicitly give them instructions but we need to provide a large set of example data from which the machine learns new skills.

Furthermore, how do self-driving cars understand what they see? The biggest autopilot cars are the Tesla autonomous vehicles. There are sensors perceiving data from all directions. Autonomous cars need to learn to navigate our environment, and that requires a lot more than a good video camera that’s why self-driving cars come equipped with ultrasound, radar, Lidar unit and infra-red. They send out signals at constant intervals of time and study the reflections of these signals to make out the different objects in their surroundings such as obstacles, pedestrians, cyclists and other motorists.

Until now, we discussed how a robot sees its surroundings, but a major problem is how the robot understands this image and what it does after analyzing the image! Now, assuming that a robot correctly identifies the image and recognizes its properties. What does it do with this information?

Take the example of a plastic ball rolling into the road. Most human drivers would expect that a child might follow it, and slow down accordingly but when we see a plastic bag rolling on the road, we ignore it. Can a robot also think in such a complex way? This is a monumental task, even with all of their sensors and algorithms. There are several examples where we cannot possibly write an algorithm for every instance.

Take another example of Spirit or Opportunity! How do they explore Mars? The Rover can take images, process these images and determine the safest path to its destination. To program the rover, we have two options either NASA engineers send computer commands the night before it is set to complete them or let it think for itself. For the latter situation, the rover uses several cameras most importantly the depth camera, to get a 3D view of obstacles. These rovers use both these options in various degrees to get the job done.

There are still a lot more problems we need to solve before we create a perfect autonomous robot. Recent developments have shown impressive improvements in accuracy and speed however there is a need for more research in this field.

This is just a brief explanation to a vast concept so stay tuned for more blogs.

Team FACE,

In c<>de we trust.

How do machines see our world?

Written by Meghana Rao