Exploring for the first time how a Computer Vision and Machine Learning is impleted in Self-Driving Cars!

Joel Prabhod
5 min readDec 20, 2021

--

Self-Driving Cars will soon become an everyday thing, and we’re only at the beginning. If we are to understand how to create these futuristic engines, we must understand the most relevant techniques that made it all possible: Computer Vision.Computer Vision is everywhere in self-driving cars, and especially in the perception of the environment.
How to perceive an environment with a camera? What is Computer Vision in Self-Driving Cars?

The image below shows four main steps in the operation of an autonomous vehicle.

  • Computer Vision and Sensor Fusion are called Perception.
    It’s about understanding the environment. Computer Vision uses a camera. This allows to identify cars, pedestrians, roads, … Sensor Fusion uses and merges data from other sensors such as a Radar and a Lidar to complement those obtained by the camera.
    This makes it possible to estimate the positions and speeds of the objects identified by the camera.
  • Localization is the step that can locate a car more precisely than a GPS would.
  • Path Planning implements the brain of an autonomous vehicle.
    A Path Planner uses the data from the first two steps to predict what vehicles, pedestrians, objects around them will do to generate trajectories from point A to point B.
  • Control uses controllers to actuate the vehicle.

Computer Vision

Computer Vision is a discipline that allows a computer equipped with a camera to understand its environment.

Computer vision techniques are used in autonomous vehicles to detect pedestrians or other objects, but can also be used to diagnose cancers by looking for abnormalities in images .

They can go from the detection of lines and colors in a very classic way to artificial intelligence.

Computer vision started in the 50s, when transcribing the shapes of certain objects. The end of the century led us to the development of techniques such as Canny-edge detection which allows us to distinguish the evolution of the color in an image.

In 2001, the Viola-Jones algorithm demonstrated the ability of a computer to recognize a face.

Histogram of Oriented Gradients

In the following years Machine Learning became popular for object detection with the widespread use of Histogram of Oriented Gradients (HOG) and classifiers. The goal is to train a model to recognize the shapes of an object by recognizing its different orientations (gradients). The histograms of oriented gradients retain the shapes and directions of each pixel; then average over a wider area.

Deep Learning then became very popular for its performance, due to the arrival of powerful GPUs (Graphical Processor Units allowing parallel operations, not one after another) and the accumulation of data. Before GPUs, Deep Learning algorithms did not work on our machines.

Computer vision can be done by three approaches:

  • Without artificial intelligence , by analyzing shapes and colors
  • In Machine Learning, learning from features
  • In Deep Learning, learning alone.

Machine Learning

Machine Learning is a discipline used in Computer Vision to learn how to identify shapes.

There are two types of learning:

  1. Supervised learning allows us to create rules automatically from a learning database.
    We distinguish :

a. Classification : Predict whether a data belongs to one class or another. (example: dog or cat).

b. Regression : Predict one data based on another. (example: The price of a house from its size or zip code)

2. Unsupervised learning means that similar data are automatically grouped together.

There are four steps in the process of supervised learning for car detection

  1. The first one is the creation of a database of images of cars and road. Supervised learning involves indicating which image corresponds to a car and which image represents the background. This is called labeling.
Dataset

2. To see which features belong to a car, we try our image with different color spaces. We get the form using HOG features. The image is transformed into feature vectors.

Extraction of HOG features and color features

3. These vectors are concatenated and used as a training base. In the graph below, the shapes and colors are indicated on the vertical and horizontal axes.
Our classes are our blue points (car) and oranges one (non car). We must also choose a classifier.

A machine learning algorithm is meant to draw a line separating two classes according to features. The new points (white cross) are then predicted according to their positions relative to the line. The more training data we have , the more accurate the prediction will be.

Classification

4. The last step is prediction. It implements an algorithm that goes through the image and converts it into a vector with the same features used for training. Each part of the image is analyzed and passed to the classifier who draws bounding boxes around the cars.

Prediction

Machine Learning’s algorithms allow to choose which features are used for training.These algorithms are today used more in data manipulation than in image recognition because of the arrival of Deep Learning and neural networks. Moreover, the detection is slow and generates a lot of false positives. To eliminate them, we need a lot of background images (roads, streets, …).

Plase feel free to contact using

Github

--

--

Joel Prabhod

Machine Learning Research Engineer working on Computer Vision and Machine Learning Projects. Expert in Python, Neural Networks. My intention is to help people