ADAS from the inside: When will the robot take the wheel?

Andrey Chernogorov
CognitivePilot
Published in
18 min readNov 5, 2020

Today we give you an overview of the area where Cognitive Pilot originally started developing autonomous driving solutions — the automotive industry. This sphere is presenting driverless system developers with fascinating challenges, since public roads feature a much wider variety and much higher complexity of scenes than agricultural fields or the rail transport system, with multiple moving objects following unpredictable travel paths. Self-driving car manufacturers resort to deep learning technologies, the most advanced neural networks, and vast datasets.

It is no secret, however, that legislators all over the world are yet to permit the industrial use of driverless vehicles on public roads. And we shouldn’t expect this to become legal tomorrow. Market players still need to resolve a great many serious organizational, legal, technical, and other issues. This was why we decided to focus on the already existing markets of agricultural machinery and rail transport, where our AI could function and bring value immediately. As a result, combine harvester drivers don’t even have to touch the steering wheel anymore and can concentrate on managing crop harvesting parameters; engine drivers benefit from a higher safety of rail transport, and our autopilot operation modes do not require approval from public officials at any level.

Meanwhile, we keep developing autonomous driving systems for cars, retaining our leadership in related technologies. To illustrate this claim, we are implementing a number of major long-term contracts on computer vision development with car manufacturers and Tier 1 suppliers (we are bound by an NDA, so cannot provide further details), including several renowned German companies and Hyundai Mobis. We will be ready to promote our AI solutions on the automotive market more actively once the existing legal and organizational challenges have been more or less addressed.

For now, let’s return to where we started.

The 6 levels of vehicle autonomy

Many consumers still mistake advanced cruise-control functionality for a full-fledged autopilot, which can lead to serious road accidents: a Tesla recently ran into an overturned heavy truck at full speed. To remind you, ADAS experts distinguish between six levels of vehicle autonomy, and every cognitive solution used in cars has to correspond to a certain level. Each of them is intended to support specific features, such as notifying the driver about road situations, lane keep assist, emergency braking, and so on.

An autopilot that can drive you from A to B belongs on Level 4 or 5. While the fourth level still allows for human intervention, the fifth level doesn’t even involve a steering wheel in the vehicle. Levels 0 and 1 are for smart assistants. To put it simply, these are buzzers that warn the driver about dangerous road situations. At Level 2, buzzers can activate brakes (an emergency braking system), and Level 3 is for advanced cruise control units. Even Tesla solutions are yet to surpass Level 3. They can help you stay in your lane and avoid running into a wall, they can stop the car if a pedestrian jumps in front of you, and can even find an exit from a parking lot without your help, but they require mandatory human control on the road. If the robot in such a system takes an overturned white truck for a two-dimensional drawing on the tarmac, it is the driver’s responsibility to hit the brakes in time.

A word about the market

Autonomous driving systems for the automotive industry are currently one of the hottest talking points in global tech. Besides information about the real advances being made by market players, the topic is clouded in speculations, fictions, and assumptions. Several events that coincided in late August 2020 offer an exact illustration of the status quo in the ADAS market. The first event was yet another accident involving Tesla. This time, Elon Musk’s creation rammed a police car belonging to the Public Safety and Law Enforcement Department in California. At the same time, a Bavarian state court in Germany ruled that Tesla is misleading customers by using the term “autopilot” in its advertisements.

If a conflict sparks, Tesla’s representatives usually refer to the disclaimer included in corporate documents in fine print about the vehicle not being autonomous and requiring continuous control from the driver.

It’s no secret that there have been a number of cases in which consumers have accused Tesla of excessively exaggerating the abilities of its autopilot. Elon Musk rushed to clear his company’s name on Twitter, insisting that they use the term in the same meaning as aviation does.

Meanwhile, in contrast to Tesla’s marketing policy, Intel’s Mobileye announced a collaboration with Ford on global supplies of its driver assistance system. You might ask if they are offering a fully autonomous car with a fourth or fifth level ADAS. Not quite. This is an ordinary driver assistance system that corresponds to a level from 0–2.

There are two trends here to consider. There are those who are bent on driving up ratings, stock value, and overall hype, and there is the crystal-clear and honest position of professionals who are doing their job without a hurry on the other. These are the results that we should analyze for an understanding of the current ADAS market.

To continue, early in 2020, the California Department of Motor Vehicle (DMV) released a regular report containing information on the best driverless system developers testing their autopilots in the state. According to the DMV, systems produced by 60 companies that hold licenses for autonomous driving covered a distance of around 2.88 million miles in autonomous mode on California’s public roads in 2019, with a year-on-year increase of 800,000 miles. Here are the top manufacturers of 2020: Waymo, Cruise Automation, Apple, Uber, Lyft, Aurora, Nuro, Pony.ai, Baidu, Zoox, and Tesla. The quality assessment methodology focuses on mileage and hours of driving without human intervention. The professional community, however, is becoming more and more vocal in expressing its dissatisfaction with such approaches. “If we drive 100 million miles in a flat, dry area where there are no other vehicles or people, and few intersections, is our ‘disengagement rate’ really comparable to driving 100 miles in a busy and complex city like Pittsburgh?” Aurora’s CEO Chris Urmson wrote in one article. Such methodologies do not take into account the autopilot’s operation on real-life roads. We could also add that this parameter does not apply to driving outside any town in Russia — or indeed, elsewhere. What if it’s winter? What if it’s rainy or foggy? Experts unanimously admit that a new autopilot quality assessment metric is overdue.

Let’s return to our solutions.

Last year, Cognitive Pilot won a prestigious award from the Tech.AD Berlin as the world’s top ADAS developer in the Most Innovative ADAS Technology category. In an open vote, the top managers of famous car manufacturing companies and industry experts gave us third place, after the event host, the German vendor BrighterAI, and one of the world’s leading automotive brands, Velodyne.

The Berlin-based Tech.AD professional automotive association’s award for Most Innovative ADAS Technology, awarded to Cognitive Pilot.

Now it’s time to shed light on how we are forging our leadership.

From R&D to real-life vehicles

Our R&D activities go back to 2008. A few years later, our efforts yielded a specific prototype — a mobile robot that processed a video stream in real time, recognized its surroundings, detected objects, and performed a controlling action in line with the task — playing football. If the ball was within camera coverage, the robot detected it, accelerated, and pushed it with its bumper. Once the ball went out of sight, the robot started looking for it.

We brought our efforts to the next level by working on a Nissan X-Trail. It is hard to underestimate the importance of training in real-life field conditions for the success of AI systems.

The first engineering prototype of Cognitive Pilot’s first autonomous car

From the very start, we intended to develop an autopilot that performs well in any weather and on any road surface, so we compiled datasets in the most challenging conditions. We piloted our basic technologies in rain and snow, in fog and drizzle, in ditches and on rural backroads. We evaluated the performance of computer vision, decision-making algorithms, and localization.

An example of road scene object recognition in challenging weather conditions
An example of XTrail test runs in Skolkovo

A word about datasets

Unlike many market players who mostly use ready-made public datasets and target only the automotive market, we’ve accumulated considerable expertise in neural network training on our original datasets in agriculture and rail transport. Our experience in these areas is unparalleled on a global scale. We have essentially created a dataset development infrastructure and actively use it to build our ADAS models. We have a strong mapping team and original tools that employ technologies for faster mapping and control. Our specialists do not need to set up an infrastructure for neural networks from scratch. We have deployed all of our networks like office services, with easy access for thin clients so that data mapping engineers can use this function for initial approximation or relevant data selection.

We optimize our mapping tools by the number of clicks. For instance, if you’re mapping a road, you need to outline each of its elements. It takes a lot of time and mouse clicks. By employing road scene geometry, we developed a tool that can map out road elements in one click. The image splits into separate objects automatically (semi-automatically), requiring only minor corrections and increasing the output of each data mapper (along with cutting the cost of a mapped frame).

We’ve streamlined the process of selecting informative samples that are instrumental in further network retraining. Dataset imbalance is a widespread issue. About 90% of the information you collect is homogeneous data describing successful neural network operation. What you need, however, is data on situations when the network misperformed, and we try to identify such cases in our datasets.

How the neural network sees the road scene

However, we use real-life data for training. We make use of simulation only for debugging, for instance, when we choose to work at the office instead of driving in terrible weather.

Computer vision anatomy

By training our networks on varied roads and in any weather, we have developed a range of unique technologies for highly accurate detection of road scene objects and, therefore, a higher safety level for road users. When it comes to AI, we advocate an anthropomorphic approach to modeling basic cognitive processes — because this is one area where no one has excelled evolution.

We can say with certainty that we’re among the world’s best developers when it comes to the accuracy of detecting real-life objects on the road. Back at CES 2018, we compared our metrics against those of other leaders and clearly saw our competitive edge. The Americans nicknamed us “snowy AI” for safe driving in any weather.

Virtual tunnel

The so-called “virtual tunnel” is one of the earliest computer vision technologies. The technology recognizes the roadway reliably, even in the absence of markings or other infrastructure elements. It performs well in any conditions, notwithstanding the season, snow cover, whether the road is asphalted, and so on.

The virtual tunnel method is based on the principle that road scenes are inherently similar. We learned to identify the most inherent, fundamental properties of the roadway, be it a highway, a rural backroad, or a dirt road. This way, the system recognizes the roadway highly accurately and ensures the stable performance of the computer-vision algorithms developed based on this technology on various road configurations and conditions: at turns in different directions, upward and downward slopes, at night, in winter or other unfavorable weather conditions.

The virtual tunnel technology obtained its name because this is what a sequence of rectangular interest zones looks like in perspective.
Up to date, it has been one of our fundamental innovations, ensuring the recognition of any road.

The “virtual tunnel” in action

Object recognition

Apart from road recognition, another crucial aspect is the detection of moving objects. There are four basic categories: cars, motorbikes with riders, bicycles with cyclists (their dynamics are different from the previous category), and pedestrians. Recognition of the latter three groups poses a challenge because their shape is varied.

Examples of scooter rider and bike rider recognition

The next level is traffic sign recognition (TSR). This subsystem is country-specific, even though some of the signs have a universal meaning. Traditionally, there are two systems — European and American. Russia uses the European system with minor variations.

To recognize speed limit signs, we first localize the sign and then recognize the text on it. This approach isn’t conventional. Most developers only use a classification, which yields poor recognition results. We have a special subsystem for speed limit recognition with the use of OCR methods. A similar approach is applied to text sign recognition.

Recognizing traffic signs

Partially occluded objects

In cases of occlusion, or partially obstructed vision of objects, our mapping strategy involves projecting the presumed shape of an object during training. For instance, one half of a human silhouette in a dataset equals a fully visible human. Our tool outlines the entire human figure, and the network learns to treat it as a regular pedestrian.

When it comes to signs, the main challenge is to highlight them. It’s important to select images containing traffic signs that are half-hidden behind trees, for instance. We train the system up to a certain level and use it to detect such signs.

Data selection is always key. If the data is varied (not only images with perfect visibility but also barely intelligible signs), the network will perform well.

Mapping of partially occluded objects such as cars, traffic signs, and pedestrians (with the original image on the left and mapping examples on the right)

Another issue to consider is a random obstacle on the road. It could be a cargo item that has fallen off a car roof or, as in the latest Tesla incident, an overturned heavy truck. Without type checking, we need the network to detect a 3D object, realize it’s not an element of the 2D surface marking, and somehow drive around it or perform emergency braking.

Examples of road scene object detection

Now onto detecting traffic lights. In the simplest case, a smart system has to alert the driver if the green light is on, but the car hasn’t moved. A more advanced task is to read random traffic lights. Traffic signs and lights apply to a particular area. The system needs to understand the meaning of a specific traffic light post or sign within the context of other recognized road scene objects. Understanding the context of a road situation is critical, for instance, when the system sees two traffic lights — a green one and a red one.

Behavior prediction

As we compile a road scene, predicting the behavior of road users is vital. A couple of months ago, Tesla published a post claiming that its autopilot had prevented a side collision with a car approaching at full speed from the left.
Video:

Obviously, it was not Tesla autopilot’s behavior prediction capabilities that prevented the accident but the properties of the side camera and conventional engineering solutions.

A neural network above a neural network

At this point, I’ll briefly cover our approaches to predicting the development of road situations. Let us start with pedestrians, who are extremely dangerous objects and often cause road accidents. When a car approaches an intersection, the autopilot needs to assess the behavior of the people at the edge of the road. It is difficult to determine the intention of a crowd without splitting it into parts. So we start by launching a general directly-connected neural network, which operates continuously and detects all of the relevant road scene objects, such as pedestrians, cars, traffic signs, and so on. Once a human is detected in a frame, an additional neural network is launched — a network above the network. Using key points corresponding to human body parts — a virtual skeleton of sorts — it carries out an estimation of human posture. Prediction is based on specialized systems, with the most popular one estimating the position of 17 points on a human body (say, head rotation is determined by the positions of the eyes, the nose, and the ears). The system also estimates the orientation of shoulders, eyes, knees, and so on. Once the camera locks in on a target — a pedestrian — the system keeps it in focus. If the system operates with a frequency of 10 frames per second, we can recognize a human figure 10 times during one second, obtaining 10 “skeletons”. With ten instances of changes in the skeleton positions, we can estimate the pedestrian’s motion.

If someone is standing with their back to the road, they hardly pose a threat, and if their shoulders and head are facing the road, they can see us. Whenever someone’s head is facing the road, we treat them as a risk. Then the system analyzes stochastic trajectory models and makes a decision.

By gathering large volumes of motion statistics, we can construct reliable pedestrian motion hypotheses.

Some networks can both detect “skeletons” and predict their motion, but they are bulky. By contrast, our solution is cost-efficient and engages dynamically.

Predicting pedestrians’ behavior

To determine the direction of other cars’ movement, we use so-called “cuboids” (3D-orientation) and real-time tracking. To predict other cars’ travel path, the system also uses a stochastic model, determining its possible intersection with our trajectory to make the decision.

ADAS systems feature a dozen or so typical driver behavior scenarios. Maneuvers are divided into classes. What does the driver want? For instance, nothing. They are moving ahead in their chosen lane. Or they could be switching lanes to the right or to the left. Or they could be turning right or left and slowing down to pull over. Drivers’ behavior is predictable, similarly to how we recognize people’s gestures. With accurate behavior prediction, an ADAS can prevent an accident. We model an object’s behavior, estimate travel paths, and assess the risk and possible collision timing. If the object’s maneuver may lead to a crash, we select a specific scenario and follow it.

You can find typical road situations with varying degrees of complexity at euroncap.com. For instance, we’re maintaining our distance from the car ahead. The driver of that car sees an obstacle (another car with hazard lights on) and drives around it. We need to do the same.

The principal difference between a full-fledged autopilot and various assistance systems is the limited amount of time for decision-making even in a confusing situation. As they say, you can’t take no for an answer.

Artificial intuition

By modeling certain elements of human intuition, the Cognitive Artificial Intuition (CAI) technology can make accurate predictions as to how the road situation will develop for all participants of the road scene, and develops safe driving scenarios for any situations on the road, including critical ones.

Intuition is often the key decision-making factor for human drivers. As you may know, cognitive psychology explains intuitive thinking as the human ability to process complex information inadvertently and unconsciously. In the course of intuitive cognition, people do not realize all the factors that have affected their conclusion. It is the conclusion itself that we realize with the most extreme clarity.

Our experts have learned to identify meaningful signs in their research on the behavior of road scene objects. So a driver may notice with their peripheral vision a tiny shift of the next car’s side mirror, the knee of a pedestrian approaching the road, the handlebar of a bicycle on the right, and so on. These details aren’t in their central field of vision, and the driver’s rational mind does not process them on the conscious level. Intuitively, however, the driver can anticipate a potentially dangerous maneuver and change lanes or slow down. The CAI technology takes stock of such nuances.

The technology employs a range of solutions, including the detection and dynamic analysis of small-scale elements of the road scene and objects on the road. For instance, it is possible to predict the course of the car ahead by observing the changes in the position of its tail lights or side mirror.

An example of CAI detecting small details of cars, such as side mirrors, wheels, and license plates

CAI considerably improves the safety of autonomous driving. We haven’t been able to find any mass-produced equivalents in the market.

Data fusion

Another problem that has become a stumbling block for many car-robot AI developers is the integration of data from different sensors (cameras, radar, etc.) in the computing unit. In the experience of many teams, attempts to fuse data at a high level (by feeding the data from each sensor into the computing unit and then fusing it) have often resulted in a lower overall quality of road scene object recognition. The errors of each channel accumulate.

By contrast, we fuse data at the sensor level. One of our strengths, the Cognitive Low-Level Data Fusion (CLLDF) technology allows for efficient data combination in a computer vision model. As with most of our approaches, its working principle is anthropomorphic. First, the information retrieved from each sensor is synchronized and aligned with a single system of coordinates. Then we feed the raw data into the computing unit, where different types of data are processed as a whole, mutually “enriching” one another.

Holistic use of data also allows for the consolidation of all information on speed, coordinates, distance to an object, its type, mutual disposition, the presence of other objects in the immediate vicinity, and their physical properties.

This approach also serves a compensatory function: when one of the senses fails us or becomes limited, the perception of the remaining senses becomes more acute. Similarly, the architecture of Cognitive Low-Level Data Fusion allows detailed information about the road scene to be received from several sensors. So if, for example, a radar signal detects an obstacle ahead but the video camera cannot see it clearly due to sun dazzle, the AI categorizes the situation as problematic and either requests more detailed information from the camera or makes a decision based on radar data.

CLLDF considerably improves the quality of road scene object recognition, elevating it to industrial-grade even in challenging road conditions, weather, and climate (which is of paramount importance). Experts have estimated that the CLLDF technology decreases an autonomous car’s collision rate by 20–25%.

Radar vs. lidar

To gather relevant data from a road scene, many car manufacturers use lidar units as the core sensors along with cameras. Lidar is a method for measuring distances and determining the shape of objects by illuminating the target with laser light and scanning the reflection with a sensor. However, the functional capacity of lidar units considerably degrades in rain, snow, or dust clouds. Highly susceptible to contamination, the devices quickly break down. Furthermore, they are often comparable in price to a car. All of the above rule out the possibility of their industrial use at this point.

Our set of sensors for computer vision purposes includes cameras and radar units — an optimal combination for mass production in terms of both technology and cost. The professional community confirmed this state of affairs at the latest offline Tech.AD events in Detroit and Berlin.

Now that we have introduced our industrial-grade Cognitive Imaging 4D Radar, which determines not only the speed and coordinates of objects but also their shape — in snow, rain, dust clouds, or other low visibility conditions — at the cost of just a couple of hundred dollars, experts believe that the debate around the rivalry between radar and lidar “advocates” will simply lose relevance.

A video comparing the capabilities of 4D radar and lidar

Will the robot take the wheel?

In conclusion, to answer the question posed in the headline, we can say that driverless cars on public roads will almost certainly become a reality, but we shouldn’t expect any production prototypes in the next few years. Industry experts at community events keep postponing this achievement in their forecasts — currently for 10 years, if not more. Consequently, cars that are being marketed as autonomous still require mandatory control from the driver. Many prototypes with claims to level four or even five are nothing more than pilot samples with a cost per body kit that eliminates any prospects for mass production. We need to finally come down to earth. Machines will get smarter gradually. In the near future, we can expect the mass production of a level-three autopilot, which does the driving in specific modes, for instance, on highways and in traffic jams. But then again, legislators and executive authorities will need to authorize and regulate such use of car robots on public roads. These measures are not required within restricted areas, where autonomous loaders, transporters, and other vehicles are already in use. Admittedly, not all autonomous vehicles are AI-based. Many operate on the basis of GPS navigation and high-precision mapping.

In any case, we need to keep in mind that the market is still in the making. The first company to develop an industrial-grade autopilot could hit the jackpot, and every automotive market player is a contender.

--

--