Why Tesla Is Never* Going to Make a Full-Fledged Autopilot

Sergey Kurinov | Comexp
5 min readJun 29, 2022

--

AI/ML models have been making headlines for cute tricks like graphic mashups and witty chat answers. It gave idealistic spectators the bright notion of how far we have progressed with machine perception capabilities. But the truth is, this state of the art technology has already hit its limitations. There’s no way the automotive industry can master fully autonomous self-driving with what they have. The only way to move forward is to adopt the whole new approach to machine information perception logic (read further).

Automotive leaders are backing off on their promises to launch a full self-driving.

Elon Musk hoped to roll-out full self-driving (FSD) in 2020. By the end of 2022 the company has ditched the promise and the clear deadline altogether, announcing the wider expansion of FSD beta (which is not completely driverless) and even faced lawsuits from angry customers accusing it of the fraudulent claims.

Apple, which has also made an effort to launch its autonomous self-driving car for years, has announced the product won’t hit the market until 2026, and will require a driver at least sometime.

These developments leave us with the question: AI/ML magic must be not enough, that even with enormous resources that the tech major have, and with years and years of engineering, we are not there yet?

​​The problem is that the traditional methods of image processing, including capsule networks, have already reached their limit. And the limit has been reached while managing much simpler tasks, than full self-driving — tasks like video copyright monitoring, for example.

To solve such primitive tasks traditional methods require enormous resources, which means that a fully comprehensive and autonomous autopilot is not likely to happen in the nearest future, unless a completely new approach to information perception will be used.

Terminator case, or why AI/ML is not enough

The current state of computer vision allows confident self-driving only in the ‘lab-like’ conditions.

Imagine James Cameron’s Terminator equipped with modern computer vision technology like Tesla Vision, which is positioned as a big step towards the bright robot future. How much would this robot do you think would see and recognize walking down the street? Apparently, it would be more like a blind puppy rather than an intimidating walking warfare. Tesla FSD would not allow it to make as little as a few steps down a busy street, it can only drive it through the highway with as little distracting objects as possible. Let’s say it: next to the human brain’s visual perception apparatus Tesla Vision is a joke.

Why is that, considering that we have all the AI/ML magic? To recognize anything, a neural network should be separately trained on a massive array of data including each single class of objects that the vehicle may face on the road — a neural network for reading car plates, another — for traffic lights, one more — for people, and then for people in wheelchairs, and for people on bikes, for dogs, elks, squirrels, ducks and all other living creatures that happen to get on the road. Why just not train the machine on all those objects one by one? It’s too expensive, too much time. It’s just impossible. Tesla and others’ plans updates proves they can’t do it.

The solution exists, it’s Theory of Active Perception (TAPe)

The Terminator in the movie apparently could see (classify and recognize objects and solve other standard vision-related tasks) not like a Tesla FSD, but more like a human. And we believe it is possible even as we speak — with the help of TAPe-based computer vision developed by Comexp. We have made the first steps to prove the concept and the bold claim, and we’ve got impressive results.

We have developed a video-comparing technology which is currently used to search and to recognize in real time hundreds of thousands of specific video clips on thousands of channels, in movie libraries, and video hosting services.

How TAPe does it: recognition without convolution

One of the key ingredients to TAPe-powered computer vision function is its ability to do the job without convolution — a standard CV operation that requires a significant part of computing resources in such a case. Human brain performs no convolution, and so does TAPe-based video engine, which processes any image in its wholeness, just like humans, and returns accurate results despite any interference present.

How TAPe does it: Simultaneous Reading of Key Features

The second reason why the technology is so efficient is that it can simultaneously get a map of any image’s key features at any level of detail. ‘Simultaneously’ means that the features are read all together. And the number of those key features is minimally sufficient to solve any computer vision tasks. Why is it important?

Let’s look at how ML-developers train neural networks to ‘see’: first, they mark up all the images in a dataset with several dozen features, creating a feature map. The more features the map contains, the more accurate would be the machine recognition — while some developers may need, for example, 100 features to recognize faces, others will only employ 80, or 150.

TAPe allows us to get rid of this painstaking process. By mimicking the way the human brain perceives visual information, TAPe-based algorithm reads just enough features to recognize an image all at once.

According to the Theory, an image (in the broadest sense of the word) read by the human visual analyzer is ‘automatically’ broken down by the brain into the finite number of key features (not ‘pixels’, but the essential meaningful elements of what constitutes an image), and those key features are universal to all the images out there, irrespectively of the tasks.

According to TAPe, this is how our brain recognizes information — any information. Unlike standard neural networks, TAPe-based technologies do not need prior learning to find and attribute key features in the pixel arrays. Furthermore, TAPe-based tech doesn’t even need to analyze each image in detail (break it down to the primary level key elements) — it is able to recognize an image just by an excerpt that has enough the minimum number of key features.

Imagine AI/ML that could really see

To launch a full-fledged self-driving vehicle, automotive companies need to adopt a whole new approach to information processing — TAPe is one of them.

We believe that the Theory of Active Perception, if used in developing computer vision technology, will result in an essentially different architecture of neural networks and other similar algorithms within the so-called AI scope. And this will allow the automotive industry to overcome the challenge they’re stuck with in an instance: let FSD see everything it needs to see with a fraction of resources used today to just get it down the highway.

*Curious to learn more about Theory of Active Perception? Let’s talk

--

--