Opinion
Self-driving cars: How close are we from full autonomy?
We are in 2020 now. Where’s my robot taxi? Let’s look at the current state of the technology and do an educated guess
It’s unquestionable that full self-driving technology (or level 5 autonomy) will be part of our future. The question is — when it will arrive? Next decade? Next 5 years? Next year? Tomorrow? Let’s look at the current state-of-the-art and make a prediction.
Spoiler: Probably no later than 2025.
Which company will get there first?
If anyone can solve full self-driving (level 5 autonomy) by 2025, it will be Tesla. You may have heard about Waymo robot taxis but those operate in very restricted regions and require a high-resolution mapping and preparation of the routes (level 4 autonomy). It’s a good attempt but the current approach is not scalable.
The main reasons Tesla will be the first to solve full self-driving are:
- Data: They have the largest real-world dataset with billions of driven miles.
- Efficient hardware: A smart set of sensors and an in-house designed deep learning chip.
- Advanced software: The neural network driving Teslas is a very complex multi-task problem.
Over the next sections, I will develop on each of these three topics and then I will construct my prediction timeline — let’s dive in.
Data
With more than a million cars collecting data, Tesla is orders of magnitude ahead of the competition. Furthermore, the size of the Tesla fleet will steeply increase over the next years, as they increase production. Look at this chart showing the estimated number of autopilot driven miles.
In Deep Learning applications having high volumes of data is a game-changer. Particularly for such a complex problem as self-driving that requires analysing data of multiple sensors in a multi-task problem.
Usually, in the real world, the distribution of possible scenarios has a very long tail — meaning that the number of rare situations that the car may face is almost unlimited. Having a large fleet of cars collecting data allows to sample the edge cases needed for gradual improvements. The goal is to make the model more and more robust over time.
To collect real data, a set of sensors is needed. And that leads me to the next topic — the hardware.
The hardware
The set of hardware used for full self-driving usually consists of several cameras, LIDAR sensors, RADAR sensors and ultrasonic sensors.
- The cameras are used to collect images all around the vehicle;
- The LIDAR sends rapid laser pulses that are reflected on objects allowing to make 3-dimensional maps of the surroundings with high resolution;
- RADAR is similar to LIDAR but it works on radio wavelength allowing for better performance in fog and dust conditions trading off with a lower image resolution;
- Ultrasonic sensors are used on a close range to detect the distance to close objects very accurately.
Most people in this field rely heavily on LIDAR enhanced with some cameras and ultrasonic sensors. Tesla, however, is approaching the problem in a different way — without making use of any LIDAR. This makes the problem harder for Tesla! Why are they following this approach?
We know for sure that a vision-based approach can navigate a car reliably —we do that every day when we drive using our two camera sensors (also known as eyes). Another important consideration is that LIDAR sensors are expensive. Prices may get lower over time but until then cars would be significantly more expensive. And again, if we can drive with our eyes then it is feasible that an AI can learn to do it even better by having more eyes all around working all the time without distraction. The RADAR sensor is important mainly to have an enhanced forward vision even in fog and dust conditions. It can also be important to double-check the distance of objects in front. This sensor is also cheaper compared with LIDAR.
The following video gives a clear explanation of Tesla approach for the sensor suite and why in Elon Musk words: “Anyone relying on LIDAR [for full self-driving] is doomed”. I find particularly interesting the talk about using RADAR to auto-label data for depth estimation with cameras. They also mention that depth estimation can be achieved without labels using self-supervision — a very promising technique that is gaining momentum in several Deep Learning applications.
Overall we see that Tesla is solving a harder problem by not relying on LIDAR and working directly on a vision-based approach. This is the right approach because it is scalable and results in lower-cost cars that ultimately will be able to provide the lowest prices as robot taxis and accelerate the transition to electric vehicles.
Another important aspect regarding hardware is that Tesla developed a Deep Learning chip (Hardware 3.0) that has the power to enable full self-driving in an energy-efficient way. Now it’s just a matter of getting the software up to the point and deploy it by over the air updates.
Talking about the software, let’s move to the next topic and take a look at the neural network behind Tesla autopilot.
The neural network driving Tesla cars
Over the past few years, the autopilot software has improved gradually. Some exciting news is that Tesla has been working on a major update to the neural network structure to make it work with 4D data instead of a combination of 2D images. As a result, the rate of progress will likely be faster over the next months as Tesla AI team further explore the potentiality of the system that they aim to deploy at the end of 2020.
I highly recommend the video below of a presentation by Andrej Karpathy, the director of AI at Tesla, about the state of AI for Full Self Driving. I will now describe some of the main highlights.
The new neural network receives input from all sensors and combines the data in the neural network feature space (Fusion layer in the image below), creating a real-time 3-dimensional representation of the surrounding environment. This representation is then given to a Bird’s Eye View network (BEV Net) from which an extensive set of tasks need to be predicted.
Another very interesting topic covered in the presentation is how they can deal with the edge cases. The image below shows a good example — the case of stop signs. One would think that stop signs are very easy to capture and learn by the neural network. But what if they are partially occluded? Or what if they have a modifier as in the example below where the plate under the stop says “except right turns”. Autonomous vehicles are expected to be able to work in all those scenarios.
The process of training the huge neural network to perform well on edge cases like the examples above consists of running a small network in “shadow mode” that retrieves similar samples from the Tesla fleet. The samples obtained are then used to improve the training dataset. Then the big neural net can be retrained to achieve better accuracy. For the case of stop signs, the label “stop sign” needs to have modifiers to cover the edge cases.
The validation for such a complex multi-task problem is also a very important topic that is covered in the presentation.
On a recent keynote at CPRV 2020 conference, Karpathy made a similar presentation adding a few interesting examples. Starting with a good example of edge cases. I guess no comment is needed for this one other than these are real images sampled from the Tesla fleet.
Another crazy example is the following image. Can you handle such a roundabout? Regarding this example, Karpathy makes an interesting point that they don’t need to handle every possible case. If the cases that can’t be handled are known, one possible option is to follow a different trajectory that avoids a specific situation. As a human, I would definitely avoid the roundabout in the image below.
I believe the image above is also a hint that they are now working hard on solving roundabouts and intersections. That’s the logical step to follow after traffic lights and stop signs and an important step towards feature-complete autopilot.
My prediction timeline
On a tweet in 12 of April this year, Elon Musk mentioned that regarding the schedule for robotaxi release, “Functionality still looking good for this year. Regulatory approval is the big unknown”. However, Elon has been overly optimistic about these schedules. Nevertheless, the progress has been steady and we are getting closer and closer.
Now let me finally write down my timeline based on all public information about the current state of autopilot that I could find and trying to make an optimistic but yet realistic forecast:
- Late 2020 or more confidently in 2021: Tesla autopilot will be feature complete. It will be able to navigate in most scenarios but it will be far from perfect. Human supervision will be required all the time and human intervention will be common in city environments.
- 2021/2022: improvements will be very steady as more and more data is collected and used to improve the system. Human intervention will be less frequent and trips without intervention will gradually become more frequent.
- 2023: autopilot will be able to navigate perfectly in most situations. Human intervention will be sporadic. Most trips won’t require any intervention. It’s possible that robotaxi experiments in selected places may start by this year— level 4 autonomy.
- 2024: the software will reach a state where it will be safe to travel without human supervision. Edge cases may still exist but solvable with human voice command feedback or by planning the journey to avoid specific problematic situations. Tesla will start the important task of showing regulators that the technology is safe to use as robotaxis without human supervision — level 5 autonomy.
- 2025/2026: Tesla will gradually get approval by regulators to operate in more and more regions. The approval will probably occur first in the US (possibly as early as 2024 in some states) and gradually extend to other regions. For Europe, I would expect the approval to be delayed by one year compared to the US.
To wrap up, by 2025 (or maybe 2026 in Europe and other regions) I expect to be able to schedule my robotaxi ride in the app and go to any destination I choose. I’m looking forward to that day!
Final remarks
This is just my vision for the future of full self-driving. No one knows what the future will be. There are many uncertainties in the equation. My prediction is just what I think is the most likely scenario. Let me know about your thoughts in the comments.