From Perceptrons to Tesla Vision
Part 3. Partial Completion and Dynamics
This is the third in my series of posts about attempts to apply artificial neural networks (ANNs) to machine vision.
Part 1 summarized some of the early history of attempts to use Formal Neurons to build models called Perceptrons.
Part 2 discussed issues related to the architecture that is needed to make an ANN operate in ways that are somewhat analogous to biological neural networks, and also gave a thumb sketch of the current state-of-the-art implementation of a machine vision ANN, Tesla Vision (formerly, Full Self Driving).
I explained in my first two posts that ANNs can, in principle, solve any problem that can be solved by rule based models. In this post I will describe two advantages ANNs have over rule based approaches for solving problems in machine vision: Partial Completion and Dynamics.
PARTIAL COMPLETION
I will illustrate how partial completion works with a tinker toy example of how one might go about trying to solve a problem relevant to machine vision, “When should the brakes be applied?” Consider the case illustrated in cartoon fashion in the following figure in which a car is approaching a person running down middle of road:
In this situation we would want whatever process is controlling the vehicle to apply the brakes. Here is an example of a (tinker toy) model that illustrates how an ANN might be constructed to try to accomplish that objective:
This architecture mimics that of a biological hypercolumn in the brain. It processes information coming from a camera pointing in the direction the car is heading. The camera provides an image of a person running in the path of the vehicle. Three small micro ANNs, each fully interconnected and having a single output, are shown operating in this hypercolumn, one that analyzes the image for evidence of a ‘head’, one a ‘torso’, and one ‘legs’. These three ANNs each send a single output (‘detect’ or ‘’do not detect’) to a formal neuron that makes the decision about whether to send an output to apply the brakes. The image of the person in the above image contains a head, a torso, and legs so the formal neuron receives a vector code:
‘1 1 1’
And let’s suppose the parameters of the formal neuron are set in such a manner that this ‘complete’ input results in an output to apply the brakes.
Next consider the case where only an empty roadway is detected in front of the vehicle. In that case the vector code will be
‘0 0 0’
And we will assume that the formal neuron has been set in a manner such that in the case of this total lack of input regarding the presence of a human the brakes will not be applied.
But, what decision should be made when any of the other 6 potential vector codes (partial input) are received. For example, a code of
‘1 1 0’
might be received if something is blocking the view of the child’s legs. Or a code of
‘0 1 0’
if both the head and legs are blocked from view allowing only the torso to be seen.
A rule based model could be applied, with a separate rule specified for each possible vector code that could potentially be received. That would work in this “tinker toy” example in which there are only 8 possible vector codes to consider. However, in an actual “real world” situation there might be hundreds, thousands, or perhaps even millions of possible combinations of partial features that might be interpreted as situations that should lead to a decision to apply the brakes. Beyond some number, it stops being feasible to list all the rules that would have to be applied in order to solve the generalized problem of “when should the brakes be applied?”
This is a situation where ANNs can potentially provide a more manageable solution to the problem. A properly constructed ANN can apply automatic partial completion of the input signals to evaluate how close the current input is to a situation in which complete input would warrant applying the brakes.
We can illustrate this in our “tinker toy” model that has only three dimensions, thus 8 possible vector codes.
Simple calculations (relying on nothing more than the Pythagorean Theorem in multidimensional space) demonstrate that location 1,1,1 (head, torso, legs) is the furthest distance from 0,0,0 (open road) in this three dimensional space. The vectors that have only one feature point to locations that are the next farthest distance from 1,1,1 and those with two features are the closest. Thus, depending on how conservative or risk accepting one wants to be, it is possible to set a simple criterion for partial completion. That criterion can be specified simply in terms of the distance (in multidimensional space) between where a partial vector points and where the complete vector points. That criterion (a single number) could be set conservatively to allow partial completion of any input with a single feature present (head, torso, or legs), or more risk tolerant to only allow partial completion when 2 features are present.
In this simple 3-dimensional case, the advantages of a single criterion versus a set of 8 rules might seem underwhelming. However, consider the HUGE multidimensional space that might be required to hold all the partial vectors that have been identified as being relevant to a decision about whether the car should apply the brakes. Instead of specifying what should be done with a correspondingly HUGE number of rules as would be required in a rules based model, the same effect can be achieved with a single number in a properly constructed ANN.
DYNAMICS
In the examples I have used up until this point (including in my previous two posts), the formal neurons were static, meaning that they sat quiet until an input was received, and when that happened, responded with a single output and went quiet again. Biological neurons do not operate that way. They exhibit spontaneous activity, meaning that they are firing electrical signals down their axons all the time, even in the absence of new input.
A simple change to the function 𝞧[h] (described in my Part 1 post) can convert a static formal neuron into a dynamic one. The following figure illustrates a 𝞧[h] function for a formal neuron that has a spontaneous activity of 5 outputs per second in the absence of h input. With positive h input the spontaneous rate increases and similarly the rate falls with negative h input.
In the case of a static ANN, only one of the 8 possible states (vector codes) would have been output given a single input. However, in a dynamic ANN the vector code that is being output will change over time. For example, it might look like this:
A dynamic ANN can be designed to have attractor states that create a tendency to fall into a repeating pattern, as in this example:
These repeating patterns might be analogous in some ways to slow waves that occur during sleep in biological neural circuits in the brain.
An ANN can also be designed to have a fixed state as an attractor. Any time the input causes the state of the ANN to come near (in multidimensional space) the attractor state, the dynamics will cause the ANN state to approach the attractor. This is another way partial completion can be accomplished.
The upcoming Tesla Vision Version 12 is reportedly supposed to be completely based on ANN models rather than on a rule based system. Since it is a proprietary system I do not know the details of its design, but I would surmise that it must be a dynamic ANN system receiving input from the video cameras of the car in real time and designed to analyze the input images and use that information to cause the ANN output to dynamically fall into attractor states that cause the car to accelerate or decelerate and to turn the steering wheel left or right.
One last caveat. I have repeatedly used phrases above such as “a properly designed ANN.” One might be concerned that the problems related to designing HUGE lists of rules in a rules based system might be simply transferred in an ANN to the problem of coming up with a proper design. However, there is a trick that gets us out of this situation. ANNs can learn how to design themselves! They simply have to be trained.
In my Part 4 Post I take up the topic ANN training methods and how they relate to neural plasticity in biological brains.
Ronald Boothe psyrgb@emory.edu