Computer Vision in Autonomous Vehicles

Dang, everything looks so precise and accurate! Well guess what, that’s the point of computer vision on Autonomous vehicles. Wanna see more? Let’s get into how computer vision works on autonomous vehicles!
Before we get into it, we should know that computer vision is just ONE major part of an autonomous vehicle. There are other components to Autonomous vehicles(i.e. Sensor fusion, Path planning) but for now, let’s focus on computer vision.
Look at “computer vision” in its separate terms. “Computer” should refer to some electronic device and “vision” is something to do with seeing. The term “computer vision” or “perception” refers to how the computers on an autonomous vehicle understand their surroundings using visual data. This visual data comes from sources such as the cameras that are planted in the car.

Computer Vision is very important when it comes to key tasks such as identifying specific surroundings such as animals, signs, cars, and people.
There are two components to identifying the surroundings around an AV
- Determining the location of each object in its camera
- Identifying the object correctly
These components need to be done instantaneously so that the computer can provide its data to the other components of the autonomous vehicle. Therefore, these other components can make important decisions based on the data that is given.
* One very important thing to note is that: The camera DOESNT do any of the identification. It just provides visual data for the computers so that it can do the job.
So how is this approached? Well, there is 1 major way. Ya guessed it … Deep Neural Networks!!
Before we get into the “how”, lets quickly introduce the “what” to deep neural networks.
A deep neural net is a type of autonomous teaching method in which loads of training data are used to train the algorithm. Over time, the algorithm recognizes a common pattern and thus creates a predictive model to identify what is given. Through lots of training data, computers gradually learn to identify objects such as signs, people, and cars.
In the case of AV’s, to identify specific surroundings, the algorithm is trained in a way so that it identifies key parts of an image such as gradients, edges, and possible information on an image.

Cameras are quite cost-effective and reliable in the task they are supposed to accomplish. However, the problem with using cameras is that the cameras are not useful in providing numerical data like relative velocities and height.

This is where other systems such as LIDAR and radar come into play so that these quantities can be determined. Trust me, LIDAR is another very cool and interesting topic to discuss (let’s talk about LIDAR for another day).
Key takeaways:
- Computer vision is a component of AV’S in which the computer uses images provided from cameras to perform classification
- Computer vision happens because of Deep Neural Networks, a method of training
- The data that is produced by “computer vision” is quickly provided to other parts of the AV so that it can make decisions
- While cameras are a very useful source to identify objects, it is not so strong at helping the computers to identify specific numerical values
