From Perceptrons to Tesla Vision

Part 2. Architecture

Ronald Boothe
ILLUMINATION
6 min readNov 14, 2023

--

In Part 1 of this series of posts, I summarized some of the early history of attempts to apply artificial neural networks (ANNs) to machine vision. One of the earliest models, called the Perceptron, was proposed by Rosenblatt in the 1950s.¹ The architecture of a simple basic Perceptron is illustrated in the following figure.

Architecture of a simple basic Perceptron. This and all subsequent figures adapted from Boothe.²

This architecture can be characterized as:

· Two layered — an input layer that receives signals from the environment via sensors and an output layer that contributes some information about a percept.

· Feed forward — signals travel from the input to the output layer but not in reverse.

· Fully connected — Every formal neuron (defined in my Part 1 post) in the input layer connects to every formal neuron in the output layer.

· Binary output — The output is limited to a 1 (a decision that some feature is present in the environment) or 0 (a decision that the feature is not present).

A Perceptron with a slightly more complicated architecture is shown in the next figure.

Architecture with vector coding.

This is also a two layered, feed forward, fully connected architecture, but now the output layer has more than one formal neuron present, allowing an output in the form of a vector rather than being limited to a binary code.

This vector code could be used to signal the presence or absence of up to eight features in the environment. For example, suppose formal neuron 1 signals the presence or absence of a square, 2 a circle, 3 a triangle. In this case Feature 3 would indicate that only a circle was present, Feature 6 that there are both a square and a triangle, etc.

In 1969 Minsky and Papert³ demonstrated that Perceptrons with these architectures had a fatal flaw: They could not solve a simple logical exclusive OR (XOR) problem, such as identifying the presence in a scene of a circle OR a square but NOT both. ANNs with these architectures could be relied upon to solve problems whose solutions involved only linear operations, but not ones requiring nonlinear ones. Thus, Perceptrons did not seem powerful enough to use as models of visual perception, and as a result most cognitive visual scientists stopped working on ANN models in the 1970s and refocused their efforts on developing rule based theories of perception instead.

This pause lasted until the 1980s when Rummelhart and colleagues⁴ demonstrated that a minor addition to the architecture of the Perceptron could allow it to solve nonlinear problems, including XOR. All that was required was the addition of one or more hidden layers. These are layers of formal neurons that do not receive direct input from the environment and do not project directly to the output.

An architecture that includes a hidden layer.

Cognitive Scientists soon became aware of the fact that, in principle, any problem that could be solved using a rule based model could also be solved by an ANN as long as it had an architecture that included hidden layers.⁵ And ANNs had an intuitive advantage over rule based models in that, in appearance, they seemed more akin to biological neural networks.

The earlier goal of visual scientists to create machine vision using ANNs, a project that had been mostly put in mothballs for a decade in the 1970s was now revived. In the last couple decades of the 20th Century there was a huge proliferation of attempts to understand and create ANNs that operate analogously in some fundamental ways to biological neural circuits.

A sampling from a prototypical bookshelf of a Cognitive Neuroscientist in last decades of 20th Century. Photo by author.

Human brains have something on the order of 100 billion neurons and these are highly interconnected, with each neuron making synaptic connections with a few hundred to a few thousand other neurons. The interconnections are organized in somewhat of a hierarchical fashion. Small groups of nearby neurons form micronetworks that are highly interconnected. In parts of the brain that process visual signals, groups of micronetworks are organized into hypercolumns, small slabs of neural tissue that are responsible for processing signals coming from one small portion of the visual scene.

Schematic of a hypercolumn containing micronetworks.

Micronetworks can be modeled in ANNs as a small number of Formal Neurons that are fully interconnected meaning that every formal neuron in the micronetwork receives input from and also projects to every other for neuron.

In addition to being fully interconnected, each micronetwork has a smaller number of inputs and outputs that allow it to communicate with other micronetworks within a hypercolumn. For example, within a single hypercolumn some micronetworks might process signals regarding shape, others about motion, etc. Neighboring hypercolumns perform the same operations for nearby regions of the scene.

Hypercolumns are in turn grouped into larger anatomical structures called nuclei or cortical areas.

This schematic illustrates some of the anterograde and retrograde axonal fiber pathways that connect nuclei and cortical areas responsible for processing signals from the retina in the human brain.

This schematic illustrates that ANNs that operate in a hierarchical manner similar to biological brains will need to have architectures that include recurrent as well as feed forward connections.

Architecture of an ANN that includes both feed forward and recurrent connections.

Human brains serve as an existence proof that neural networks of sufficient complexity can form percepts that allow navigation in the world. The ultimate goal for researchers working on machine vision will be to design ANNs that can do so in a similar (or ideally better) manner.

The current state of the art of this endeavor for motor vehicles is Tesla Vision, formerly labeled “Full Self Driving.” Tesla Vision Version 12 is being developed as an ANN simulation on a digital supercomputer called Dojo that went into production in July of this year (2023).

Tesla Dojo Architecture. CC BY-SA 4.0 via Wikimedia Commons

Dojo uses a hierarchical architecture. According to Wikipedia, Dojo contains:

  • 354 computing cores per D1 chip
  • 25 D1 chips per Training Tile (8,850 cores)
  • 6 Training Tiles per System Tray (53,100 cores, along with host interface hardware)
  • 2 System Trays per Cabinet (106,200 cores, 300 D1 chips)
  • 10 Cabinets per ExaPOD (1,062,000 cores, 3,000 D1 chips)

Tesla Vision, up to Version 11 beta that is currently available for sale on Tesla vehicles, apparently runs on a model that is partially rule based and partially implemented via an ANN. Tesla says that Version 12 will be based entirely on a trained ANN.

I address two additional features of ANNs, partial pattern completion in my Part 3 Post:

I discuss what it means to train an ANN in my Part 4 Post.

Ronald Boothe, psyrgb@emory.edu

NOTES:

  1. Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6), 386–408.
  2. Ronald G. Boothe. Perception of the Visual Environment. Springer-Verlag New York, 2002.
  3. Marvin Minsky and Seymour Papert. A Step toward the Understanding of Information Processes: Perceptrons. An Introduction to Computational Geometry. M.I.T. Press, Cambridge, Mass., 1969.
  4. Rumelhart, D. E., Hinton, G. E., and Williams, R. J. (1986)
    Learning internal representations by error propagation.
    In Rumelhart, D. E. and McClelland, J. L., editors, Parallel Distributed Processing: Explorations in the Microstructure of Cognition. Volume 1: Foundations Volume 1: Foundations, MIT Press, Cambridge, MA.
  5. Stating this more generally, ANNs with hidden layers can, in principle, solve any problem that can be solved with a Universal Turing Machine.

--

--

Ronald Boothe
ILLUMINATION

Professor Emeritus, Emory University, Atlanta, GA, USA