Why A.I. doesn’t learn like humans: Part 2 — Deep Learning

Kian Parseyan
6 min readMar 30, 2018

--

If you already have an intuitive understanding of how neural nets operate, skip straight to the next section on curiosity, motivation, and logic.

In the last section, I covered intelligence and how it is part of the concepts that underpin the current superiority of the human race over computer AI (intelligence, deep learning, curiosity, motivation & logic, consciousness, understanding, and sentience). In this post, I’ll cover deep learning’s operating principles and describe examples where we can find it echoed in the human brain. To be clear, this version of “deep learning” describes the fundamental aspects behind network-based computation and is not intended to describe the mechanism of deep learning algorithms in artificial neural networks.

At the base of deep learning is concepts. Everything is made of concepts, including concepts. The technical but more accurate way to put it is that information is quantized, and each quantum is represented using other quanta. Each concept (aka information quantum) is a packet of information that represents a pattern in the real world, and it takes information to define information. For example, take the shape we recognize as the letter “X”: it is made up of two lines (with “line” being a concept and “two” being another concept) that are similar in length (length is a concept, and similar is a concept), are intersecting (a concept), are oriented relatively perpendicular (another concept), and are diagonal (yet another concept). We then take this shape known as “X” and attach many other concepts to it, such as its significance as part of the alphabet, or in math, or a location on a map, or as a symbol for a kiss, etc. Evidently, something as simple as “X” is defined using many different concepts and is used as the basis for many more.

Notably, the concepts that use “X” as a basis for their definition are at a higher level of complexity than the concepts used to define “X”. Therefore, concepts have a relative position within a ‘complexity hierarchy’. As you get deeper in the hierarchy, concepts represent more inherently complicated information. “X” is more complex than a line, and ‘the alphabet’ is more complex than “X”, and so on.

In bringing our understanding into the physical realm, it’s also important to recognize that every layer in this hierarchy is defined by connections to the previous simpler layers and that related concepts share similar connections to concepts in layers above and below. This architecture allows information to be physically organized, such that the connections of a concept within the hierarchy denotes its meaning and meanings defined this way are semantic. The meaning behind a concept is always in relation to other concepts. There is no such thing as absolute meaning within a network architecture and everything is relative (i.e. semantic).

To organize data into a semantic hierarchy, it first needs to be broken down into the simplest concepts and rebuilt into more complex concepts as appropriate. For example, a picture of a face would be broken down into dots/pixels, then into boundaries of contrasting colors and shades, those being used to define straight lines and curves, and along with the relevant color and shading information (if available) used to define specific shapes. Then, recognizing those shapes in context of each other and their color (such as a nose, eyes, ears, and a head), would ultimately identify the image as a face. This definition of a face still has a lot of context attached to it such as the specific colors, shading, spacing, and precise orientation of the recognized shapes to each other that can be used to define an identity for the specific face. The obvious utility of this example for deep learning is in visual recognition but this principle can extend to recognize patterns in anything that can be sensed, such as sounds (eg. voice, music), movement (eg. walking, running), radar/LIDAR (eg. physical objects), etc. This approach allows the representation of very specific information without a limit to the specificity or the sensory modality used to collect the information.

In order for this architecture to learn however, we need to recognize two additional characteristics within the deep learning principle: activation thresholds and consolidation. Concepts exist in a binary on/off state but that is not how the world works. Information can be blurry. There needs to be a point at which a blurry line is just a blur or a line, based on a threshold, to activate specific concepts. The ability to adjust activation thresholds allows learning to occur. When information is presented to a deep learning network, a specific pattern of concepts are activated, which in turn activate higher complexity concepts. And this happens in the same way for the same information. However, consolidation allows concepts that are routinely activated together to coordinate their activity as a group. Once concepts have become consolidated, the activation of one concept temporarily reduces the activation threshold of the other concepts within the group, making them easier to activate. This learning technique enables the use of past experience as clues for what to expect in the future. For example, if a line is expected (i.e. learned), a blurred line may be enough to activate the “line” concept whereas if a line is not expected, a blurred line would not cause the same activation. Therefore, repeated exposure allows deep learning to group the activation of related concepts and introduces a way to recognize concepts within a margin of error.

You’ve probably already begun to recognize similarities between deep learning and the brain, the most striking similarity being between concepts and the connections between neurons. Interestingly, the brain employs a number of learning strategies. For example, some concepts are mutually exclusive, such that they should not both be activated at the same time (such as something being transparent versus opaque, or an object being in the foreground or background, etc.). In addition to reducing activation thresholds, concepts in the brain have the ability to increase activation thresholds, making certain other concepts effectively impossible to activate while the other is activated. This learning strategy is known as inhibitory consolidation. There are a lot of different ways that the brain modulates the activity of neurons and the reality is that we do not yet understand all the cellular strategies the brain uses to learn.

In spite of this shortfall, deep learning computer algorithms have already surpassed the accuracy of humans to recognize visual concepts, and can learn new complicated concepts in minutes. Presently, this accuracy is owed to the massive databases that are used as a learning resource, while the speed is owed the overall computational frequency of computers versus the brain. In essence, deep learning can very quickly become an expert in almost anything for which we have a lot of well-structured data. The ability of deep learning to learn almost any pattern makes the algorithm very intelligent and confers an indirect ability to increase its future options through humans.

In the last post, we defined intelligence as “the ability to increase future options.” However, unlike a human, deep learning algorithms cannot gather their own data and consequently have a limited narrow band of intelligence. This kind of deep learning computer intelligence as we typically see today is referred to as artificial narrow intelligence (ANI) or weak AI. Until deep learning is able to ask the questions that allow it to find relevant information to absorb and apply, it is not generally intelligent. Software that is capable of this level of processing is regarded as being artificial general intelligence (AGI). Up to now, we’ve defined intelligence (part 1) and identified the current status of computer intelligence: ANI. The next topic (part 3) discusses the differences between humans and computers that would enable the existence of AGI: curiosity, motivation, & logic. Part 4 dives further into how logic works and describes consciousness and its role in understanding. Part 5, which describes sentience and volition, will probably not get produced.

--

--