Inside a Car with its own Mind
LiDAR vs Computer Vision explained
Imagine if you didn’t have eyes. 👀
Of course, now you wouldn’t be able to see… without any help. You could use a dog or blind cane to help, but there would still be a high chance of you getting hurt, and that’s the same with cars. Cars are like you without eyes; they need someone’s help to be their eyes.
Right now, 1.35 million people die each year from car crashes since they didn’t do a good job of being in the car’s eyes, so people decided to fix that. People created self-driving cars or autonomous vehicles, which are cars with eyes and a good brain (although it’s still like humans since it needs to learn first).
A self-driving car is its name; it drives by itself, but how? Of course, like we need our senses to survive, these cars need some way to detect their surroundings to survive, so they use different cameras. There are many ways the car can see because it can use radar, cameras, radios and ultrasound. Of course, any vehicle can not just pop the cameras on and move; it needs some help from the humans.
I have made another article on Intro to self-driving cars. Here is a quick recap:
RECAP:
Five physical features make the vehicle autonomous:
- Computer Vision
- Sensor Fusion
- Localization
- Path Planning
- Control
Computer vision and sensor fusion work together to get an understanding of the environment. Localization uses guidance points to detect where it is and where to go. Path planning is what detects where it can drive and should be driving. Control is combining the information to control what it’s trying to do.
If you want more information on this topic, read my other article, which discusses the fundamentals of autonomous vehicles: (give link once published)
So you may be asking, “where is the real explanation? I came here to learn about LiDAR and Computer Vision?”, then let’s drive right into it 🚗 !
🖲 LiDAR
LiDAR stands for Light Detection And Ranging. You may have heard of LiDAR before because autonomous vehicles are not the only use for them.
They can be used to create high-resolution maps or viewing certain areas. Before LiDAR being the primary use of autonomous vehicles, it was used for archeology and agriculture since they needed models of the land.
A LiDAR sensor is a camera that sits on top of the car and is always spinning, which provides a 360° view. While spinning, it uses pulses of ultraviolet light (which sometimes can be visible or infrared)to detect objects in its surroundings. These pulses get the distance between the car and the range of the object. It gets this accurate distance by measuring how long it takes for the light to return to the sensor itself. There is a specific formula which is:
( D = (Et x c)/2 )
For example, if the LiDAR sensor spots a traffic light in the middle of the road, it sends the laser there back, then records the data of the elapsed time[Et]. It uses the speed of light [C] and then multiplies them both then divides it by two to get the total distance[D].
This process can work up to 60m away from the camera and finds the distance of the objects by firing 100,000s beams per second.
The significance of doing this is to detect if there is anything the car should avoid like pedestrians, road work or almost anything on the road.
For example, if the car sees a pedestrian randomly on the road, it can get the distance and slow down just in time to stop the collision.
Since the LiDAR sensor is sending off thousands of beams of light almost every second, it almost creates a 3d visual map since it knows about everything around it. This map is created using software that provides information to the car about its surroundings. This helps the vehicle get a full scope of the environment, allowing the vehicle to drive in almost any type of condition.
LiDAR needs to use coordination with the cameras because it cannot process everything on its own. LiDAR processes in 3 steps.
- Clustering. When lidar senses the rough shape of what is around, it making the objects become something recognizable
- Classification. When the things it’s already roughly scanned, they can be identified and classified.
- Modelling. The objects which are identified are being predicted as to all of the possible movements it can go.
Now you know what LiDAR is and how it works, many self-driving car companies use it. Some of the most popular and more significant companies that make the autonomous vehicles using LiDAR are:
- Waymo
- Argo AI
- Cruise
- Aurora
- Mobileye
These companies all began at different times, but Waymo is the most popular out of them(I’ll focus more on Waymo’s recent developments with LiDAR). Waymo began to be a self-driving car project in 2009, having Google as its parent. Waymo has also partnered with some truck and car companies like Volvo in the past. Unlike Tesla (which I will discuss later), Waymo’s vehicles haven’t reached the roads nationally yet and are still developing. Waymo is currently at level 4 of automation which is the second last level before full automation.
If you didn’t know, there are 5 levels of autonomy:
- No Automation/Driver Assistance — you need a driver actually to drive since there is not autonomy
- Partial automation — the car can move on its own, but the driver needs to be away for the whole time and be ready to take over
- Conditional automation — the vehicle can still drive on its own, but a very low chance is of the driver taking over
- High automation — the car can be independent, but the driver still has to stay in the driver’s seat in case
- Full automation — The driver can become a passenger
Waymo is still trying to reach full autonomy. Waymo Cars are usually in Phoenix, Arizona. There are more than 25,000 self-driving cars in Austin, Texas, Mountain View, California, and Phoenix, Arizona. In total, Waymo has driven 5 billion miles in the world.
💡TL;DR:
- LiDAR Stands for Light Detection And Range
- LiDAR has existed before but not used for this purpose
- Using the lasers, it creates a 3d map of its environment
- It sends millions of beams of light per second to an object to calculate the distance
- It uses Clustering, Classification and Modeling to detect its surroundings
- One of the most famous Self-driving car Companies are Waymo, which is at level 4 autonomy
📷 COMPUTER VISION/ CAMERAS
Imagine an annoying wedding photographer when you are trying to envision what a self-driving car with cameras looks like. Well, maybe multiply tiny ones all around the sides of the vehicles.
What I’m trying to get at is that this is different from LiDAR.
If you were paying attention while reading above, you know that LiDAR uses lasers to find the distance and its surroundings. In this case, the cameras are all visual and do not depend on the ranging or detection. These cameras provide the images, and the AI programmed software analyzes them with a high level of accuracy. To help you imagine, the cameras or sensors are spread out all around the car to get a 360* view of the surroundings.
Unlike LiDAR, it uses visual data, which is then processed using computer vision, machine learning and artificial intelligence. To detect an image, there are two main steps:
Object Classification + Object Localization = Object Detection
OBJECT CLASSIFICATION:
Image Classification is the process of finding out what are the specific objects in the image. This detects a rough scope of what is around them, like a pedestrian or bike. Then with the help of image localization, it provides the exact location of the objects by using boxes (shown below)
Since the car is not a human and cannot automatically know the vehicle is a car, it needs to be trained. To train the software to perform image classification, they train a convolutional neural network. Don’t worry, and it’s not as hard to understand as it seems.
A CNN is a specific type of artificial neural network used to image recognition for pixelated data (or pictures online since they’re just made from pixels. A neural network is a software program based on neurons in the human brain to replicate how we identify or evaluate things.
If you are still lost, let’s take it to a smaller scale. If I gave you an apple and a banana, how do you know which is which? You first give out the characteristics; for example, an apple is red, round and a banana is yellow and long. Although they are both fruits, you still figure it out. When you were younger, you probably saw an apple for the first time, and that’s when you learned that this thing, an edible red thing, is an apple. As you grew older, you started getting more knowledge of different varieties of apples to identify other fruits.
Just like your brain, the CNN is trained to recognize different objects. Then the CNN goes through filters to sort out what the thing might be (convolution).
Sometimes when an image is too big and close to the camera, you use sliding boxes that scan the image in rows called the Sliding Windows Algorithm. When the image is reviewed, it goes through the convolutional neural network and identifies what the main object is in the image. If it shows the sky or the road, it would be considered a false prediction since it is not essential. If it shows something useful like a pedestrian or a car, it would be an accurate prediction.
The sliding window algorithm is a very slow and time-consuming process, so there is another way to classify bigger or smaller images faster. This process is called YOLO✌️. No, I am not talking about YOLO as in the texting terms, and I have not misspelled yoyo. YOLO stands for You Only Look Once.
This means it only goes through the CNN one time and gets processed quickly. It turns the entire image into a grid and identifies the key areas. This uses clues to figure out the critical objects in the image. In this process, they use a Class probability map that adds a rough identification of where the different objects might be in the picture. Using YOLO, it’s more cost-effective and less time-consuming but gives a good result too.
OBJECT LOCALIZATION:
Object Localization is the second step to the process. This step is to figure out where exactly the objects we are trying to identify in the grid are.
There is a specific program/algorithm to do this process called non-max suppression. This is when you compare the bounding box results(the main rough outlines) with the actual bounding box. When we compare the two outcomes, they find the best outline of the image to match with the bounding box. The number tries to be the closest to the prediction boxes.
After training CNN to see and predict the bounding boxes, you can test it with the image itself, giving you multiple grid cells and multiple bound boxes. The process above repeats in trying to find the closest bounding box to the actual box itself. After processing the image through the non-max suppression algorithm, the boxes would have captured the best result.
As for the result, using the object deceiving with the localization, we get the official outcome.
You have probably heard of Tesla, a tech company that makes cars. This company also makes self-driving vehicles and specifically uses Computer Vision/Cameras to detect its surroundings instead of LiDAR. Tesla first announced its first version of a self-driving car in 2014. Since then, they are on level 2 of autonomy, which is partial automation. You may be wondering why such a big company is still at this stage, but that’s not the point.
Tesla is trying to figure out cost-effective and efficient ways to produce self-driving cars. Unlike Waymo, they have way more cars on the road already and are growing rapidly. Tesla has said that reaching level 5 autonomy is “viewed as a distant goal” at the moment.
💡TL;DR:
- Object Classification + Object Localization = Object Detection
- Object Classification is finding out what the image exactly is
- Object Localization is predicting where it can go and where it exactly is
- Object Detection is finding out all the needed information relating to the cars
- All three components make up the computer vision for the car to know its environments
🦾 Which is Better? LiDAR vs Computer vision
To decide our winner, we first need to look at both of their strengths and weaknesses.
PROS
For LiDAR, as you know, it creates a virtual map using the light pulses in real-time. This means we are still getting results all the time, making sure the car has a smooth ride. This data can help the car maneuver and navigate itself safely by avoiding all points of collision. LiDAR is also able to determine different objects from a distance as well. One thing that boosts the positive side of LiDAR is its accuracy.
Waymo, a user of LiDAR technology, has said that it’s even able to read the hand signals of bicyclists trying to predict where they might go. LiDAR can see these small details that even humans would sometimes ignore.
Another significant advantage of LiDAR is that it almost gives a 3d outcome. This means that it can be seen through shadows, sunlight, or even car headlights passing by compared to cameras or even human eyesight.
Although LiDAR has good vision or accuracy, it also saves computing power. This means that Lidar can tell the distance details of an object and where it was. Compared to a camera system, it needs first to do several steps to analyze the image to check the speed and distance.
Some companies have already used LiDAR, but famous car manufacturers use it too. An example is Audi has started to use a front-facing LiDAR in some of their cars. Volvo has also said that they will try and use LiDAR in one of their new models coming in 2022. Apart from these smaller companies, LiDAR is more used in transportation methods instead of consumer companies.
For Cameras or Computer vision, there is a powerful push towards it from Elon Musk. He said that the cameras are the MOSt reliable type of visioning system. He speaks with the help of AI to identify different objects, and it has an advantage over LiDAR.
When the cameras are paired with computer vision, it provides computational imaging, which keeps on getting analyzed by the camera. One very important thing that the cameras can do is read specific text from road signs. It’s really important since when the self-driving car finds a detour or construction, it can know where and when to stop.
Elon Musk has repeatedly said that he does not want to use LiDAR since he thinks cameras can do the job well enough.
“The whole road system is meant to be navigated with passive optical, or cameras, and so once you solve cameras or vision, then autonomy is solved. If you don’t solve vision, it’s not solved.” - Elon Musk
Tesla has said that instead of creating something completely new, they decided almost to replicate human eyes but almost become superhuman-like. They have done this by placing eight cameras around the car that can enable it to get a full view. They have also used radar and machine learning abilities. This makes it see and react way quicker than any human that can drive.
CONS
Some of the more downsides of LiDAR would be that it can be a bit funny at times. No, I don’t mean funny as in-jokes, rather not too reliable. Sometimes in different weather conditions, LiDAR cannot see 100% clearly, which is an issue for safety. Even though it can determine other objects and their distance, it still does not change the negative fact.
LiDAR can sometimes be affected by wavelength stability and detector sensitivity. This means that The laser wavelength can change and be affected by different temperatures. This sometimes happens because there is a poor Signal-to-Noise ratio. An SNR impacts this because it changes the sensors of the Lidar, which can jumble up some things, creating a false outcome.
Sometimes when trying to identify the object itself, LiDAR is not as efficient as cameras can be. It can not identify street signs or the colour of a traffic light either. Lidar also needs a lot more data processing in the software it uses to create images.
Another aspect of liDAR, if you already didn’t know, that it is a lot more expensive. AN average LiDAR sensor can cost up to $1000 each. When placing it or attaching it to the car itself, it makes the experience of having an autonomous vehicle look a bit chunkier.
Some downsides of Cameras would be that even though it is more reliable for envisioning objects, it does not have the range detection feature that LiDAR has. Cameras are better at imaging, but they also need the help of other technology to check off all the mandatory needs. Tesla, for example, uses different types of sensors, including radar, to detect the distance and range.
Like LiDAR, people have said that sometimes cameras cannot see too well if there is a different weather condition. They need to be better than human drivers trying to drive as well. This applies to the light station as well. Unlike LiDAR, the cameras can not capture exactly what is going on because of different lengths. This is also why Tesla uses radar in front of the vehicles (+ radar is way cheaper than LiDAR).
Another aspect of cameras is the computer vision itself. Recently, the neural network and machine learning systems couldn’t process large amounts of data captured from the camera. In rectny developments, these programs have been improved way more, and they are now able to process the accurate world inputs even better than LiDAR.
Conclusion
The Final winner is…no one! It is a tie (sorry for tricking you). Right now, if you are saying that safety is the top priority (which should be), then sensor fusion is the best option. Sensor fusion is a mix of the different good parts of both of the sensors. If you combine various sensors like LiDAR, it will ensure the best safety. Since both LiDAR and cameras use AI software and neural networks to process data, they can work together.
As technology gets developed and the different algorithms get better, the accuracy and safety will increase more and more.
This is not something that quickly happens because this a big topic with a vast amount of explorations till it needs to be done. The possibilities are limitless because machines do not have to be the same way as humans regarding decision-making or choices. Maybe in the next couple of years, there will be something different that dominates this field. Until then, we can take the positives together from both and make one whole.
As you can see, when you are trying to give a car a pair of eyes, you need to have all of its senses too. It cannot just work with one item since they all play a different part. Many companies are still not 100% sure what to use or if that technology is worth their time.
I hope this gave you a good understanding of what LiDAR is and what Computer vision is. If you enjoyed this article, give a round of applause. Make sure to check out some of my other articles.
:)