A Basic Model for Machine Learning — an overview

Part 1 of 2

The most common forms of ‘machine learning’ in software today are algorithms that can learn from and make predictions on data. Arthur Samuel in 1959 coined the term ‘machine learning’ as giving “computers the ability to learn without being explicitly programmed.”

Today, with the substantial increases in computing and data, machine learning algorithms are used productively for narrow tasks such as ‘recognizing’ images, spoken and written words and many other things.

All of this is very useful, however this approach does not lead to an understanding of things, and it has fundamental limitations in the way it learns:

  • often a very large quantity of data is needed to achieve useful results
  • the algorithms tend to be ‘black box’, their work is not introspectable
  • new learning often forces an entire reconstruction of the model

This bears no resemblance to the way young children (and animals) learn.

  • learning from a single piece of information (‘one-shot learning’)
  • associations and decisions are explainable (introspectable)
  • new learning can be incremental and can arise from mistakes

A recent research paper from MIT described the idea of ‘one-shot learning’ succinctly:

“Although machine learning has tackled some of the same classification and recognition problems that people solve so effortlessly, the standard algorithms require hundreds or thousands of examples to reach good performance. While the standard MNIST benchmark dataset for digit recognition has 6000 training examples per class, people can classify new images of a foreign handwritten character from just one example.”

In addition to one-shot learning, the concept of learning from mistakes is also elemental. A neural network uses gradient loss functions in back-propagation, however this is part of large scale brute-force iterations in building a model, not a natural part of applied learning.

Introspecting the reasons why a decision was made, insofar as this arises from conscious ‘slow-thinking’, is inseparable from a productive learning loop. A child is asked ‘does this thing belong?’, she says yes and is able to explain why. She tells a story about why. The teacher explains the answer and in the process the student forms new understanding, or reinforces existing concepts.

‘Why?’ a learner’s most powerful question — is a story.

My objective here is to outline a basic model for learning, recognizing that machines and animal brains are very different but that the elemental concepts and process of learning need not be.

An “infinite loop” for learning

The essential learning loop in people/animals can be summarized as follows, given the roles learner and teacher.

a. Pattern acquisition:

a.1 learner acquires pattern of symbols (images, shapes, …)

a.2 the pattern is attributed with some thing (a name)

b. Pattern understanding:

b.1 learner is asked whether or not a pattern ‘belongs’ to a thing

b.2 learner provides best answer given the patterns learned

b.3 teacher provides reinforcement or a correction, resulting in acquiring more information

That a pattern does not belong to some thing is an element of learning.

Cookie Monster on Sesame Street ‘a thing that does not belong’

Children learn from seeing examples that ‘do not belong’.


Let’s begin by imagining a basic learning model that identifies basic patterns, learns how to identify distinguishing features for groups of patterns and determines if a pattern is attributable to a thing or not.

Symbols

Let’s use letters as symbols and sequence symbols together to represent a relatively simple pattern. This is a most basic starting point.

A symbol is a simple string.

symbol: a

and another

symbol: x

A symbol can have other symbols within it, a lower level of abstraction.

symbol: x, (x1)

And deeper still…

symbol: x, (x1, (x1a, (x1a!)))

We can sequence a set of symbols together into a symbolic structure (an ordered list of symbols):

symbolic structure: a b c

Now let’s imagine another:

symbolic structure: x y z i j k

What emerges is a fundamental programmatic class and related data structures useful in machine learning. The symbolic structure can contain other instances of its kind, the symbolic structure is a list of symbols, each one capable of having a list of symbols within it, and so on.

Things can be thought of as embodying other things.

Note: the alphanumerics used in these examples could be shapes, colored blocks, whatever — they are symbolic. We’re not actually referring to the letter a, it is serving as a convenient symbol.

Pattern matching

Patterns can be matched to produce a ‘mask’ of their similarities, if any. We can use techniques in mathematics (set theory) to find similarity.

symbolic structure: a b c and symbolic structure: x y c

share the symbol: c, in the 3rd position

symbolic structure: 1 1 9 and symbolic structure: 9 8 7 6 5

share the symbol: 9

symbolic structure: 1 0 1 and symbolic structure: 1 1 1

share: 1 (in position 0) and 1 (in position 2). Notice positions are zero-based.

Because pattern matching masks are lists of symbolic structures, they can be pattern-matched with other masks to reduce a list of patterns to common features. In set theory this is called the intersection of a powerset.

the intersection of a powerset

Pattern attribution

Now let’s imagine that a symbolic pattern can be given an attribute, that is to say we associate a pattern with some ‘thing’ or not. The nature of the attribution, for now, will be “is”, but could obviously be otherwise.

Note: we now represent the symbolic pattern as an ordered [list of symbols].

symbolic structure: [ a b c ] is a ‘foo’
symbolic structure: [ x y z i j k ] is not a ‘foo’

This is a common and basic method for learning by young children. They are shown a picture of a bunny and hear ‘bunny’, they are shown a picture of another bunny and again hear the word. A picture of a kangaroo does not get the ‘bunny’ sound. The child is learning, incrementally, from small data.

Of course an image of a rabbit is a far more complex symbol than a single alphabetical symbol, but they are both symbolic nonetheless.


It is with children that we have the best chance of studying the development of logical knowledge, mathematical knowledge, physical knowledge, and so forth. — Jean Piaget

What happens next in our learning model is recognition of patterns, and inherent in that is the identification of distinguishing features in patterns.

Patterns and distinguishing features

Suppose you were shown the following symbolic patterns, again using alphabetic symbols for convenience (ignore the meaning of the letters).

symbolic structure: [ a b c ]
symbolic structure: [ x y c ]
symbolic structure: [ n o c ]

and you are informed that these patterns are representative of the thing ‘foo’, i.e. they each were suffixed by “is a ‘foo’ ”

What is the distinguishing feature of ‘foo’?

Answer: the ‘c’ symbol in the last position.

Again: ‘foo’ is:

symbolic structure: [ x b c c a]
symbolic structure: [ x c c y]
symbolic structure: [ y n c c x]

What is the distinguishing feature of ‘foo’?

Answer: the existence of the ‘c c’ symbols together. This combination of symbols is itself symbolic. You may also have noticed that all the patterns have an ‘x’

Now if you were subsequently shown another pattern:

symbolic pattern: [ x p q n c c o p]

You might assert that, to your understanding, this is also a ‘foo’. You might be right, or not, but the basis for your assertion is the existence of what was noted as the distinguishing feature(s) of the patterns given for the attribute.

Symbolic patterns inherent in instances of a thing become the basis for pattern matching.

Why do you think that is a ‘foo’?

Because it exhibits both the ‘c c’ pattern and the ‘x’ pattern seen in all others identified as ‘foo’.

When you scanned the patterns above and understood the question given, what your brain did consciously was to look for common patterns. You scanned the first 2 patterns, noticed they both started with the ‘x’ symbol, then instantly noticed the 3rd pattern lacked this feature but still contained that ‘x’ symbol (an AND construct). A quick re-scan revealed that all 3 patterns contain ‘c c’ and you likely perceived this pair as an aggregate symbol (no longer necessarily 2 separate symbols).

Do this again and be conscious of your mental processes as you go along. It’s not difficult to introspect your own thought-processes in reasoning, or so-called ‘slow thinking’.

Now you are given yet another pattern that is ‘foo’

symbolic structure: [ i t c c o v]

What now is the distinguishing feature of ‘foo’?

Answer: the existence of the ‘c c’ symbolic structure is the distinguishing feature. The symbol ‘x’ is now no longer a distinguishing feature.

You have learned more about ‘foo’. What was previously held as a distinguishing feature was jettisoned instantly. Of course it is possible that the symbol ‘x’ still plays some role in the pattern-matching for the thing ‘foo’, but let’s leave that aside for now and stick to basic principles.

It’s possible for the distinguishing features to be expressed with AND, OR logic. This would be the case if an attribute’s symbolic structures where placed into sub-groups in order to sustain a powerset intersection.

Learning from negative examples

So far we’ve learned that ‘foo’ is represented by several symbolic patterns:

is ‘foo’:

symbolic structure: [ x b c c a]
symbolic structure: [ x c c y]
symbolic structure: [ y n c c x]
symbolic structure: [ x p q n c c o p]
symbolic structure: [ i t c c o v]

Now suppose you are informed in the above progression that symbolic pattern: [ x p q n c c o p] is NOT ‘foo’, and you are given a few other patterns representing a thing not being ‘foo’.

is not ‘foo’:

symbolic structure: [ x p q n c c o p]
symbolic structure: [ o p a t g]
symbolic structure: [ w q o p i]

The distinguishing feature of not being ‘foo’ appears to be the ‘o p’ symbol.

Now we have a stronger sense of whether or not a symbolic pattern is or is not the ‘foo’ thing.

Abstractions

It’s common for symbols to have abstractions and for pattern matching to take shape in levels of abstraction.

symbol: terrier , (dog)
symbol: trout , (fish)
symbol: crow , (bird)
symbol: robin , (bird)
symbolic structure: [(terrier, [dog]), (trout, [fish]), (crow, [bird])]

The symbol ‘robin’ (a type of bird) should pattern match (at the abstraction level of animal type) with the symbolic pattern containing ‘crow’ (another type of bird).

There can be several layers of abstraction and multiple symbols in each layer. New underlying data can be established in transforms of the patterns, yielding additional pattern matching opportunities.

Machine Learning algorithms and cognitive systems

Commonly used ML algorithms lack these elemental learning qualities: one-shot learning, introspection, applied learning loops. And commonly used math libraries, including those with set-theory support, lack the functions and data structures necessary to achieve what we’re describing here.

Your recurrent LSTM neural-network used in encoding-decoding text cannot tell you why it provided some term, nor handle a correction when it gets it wrong. Its model must be rebuilt from scratch, hundreds of thousands of iterations through the data to fine-tune its many synaptic weights.

The ‘nearest forest’ algorithm or convolutional ANN used in recognizing hand-written digits suffers from the same deficiencies. It cannot tell the story of why some thing is an ‘8’ rather than a ‘7’. It determined that a thing is more likely an ‘8’ than another digit by seeing thousands of digits and tuning itself over 10⁶ iterations for this one specific purpose.


The model of learning outlined here is a potential starting point, among others. It proposes a structure that recognizes patterns and surfaces distinguishing features, subjecting them to scrutiny. A software system that learns through the process of exercises and questions and is able to determine if a set of patterns are attributable to a thing, or not. A class and data structures useful in understanding a thing.

The German philosopher Martin Heidegger wrote “The interpretations of the thingness of the thing which, predominant in the course of Western thought, have long become self-evident and are now in everyday use…”

The thingness of a thing — a thing is that “around which the properties have assembled.” — Martin Heidegger

If we have examples of a thing (a bunny, a handwritten letter, a shape, etc.) then we form patterns of what the thing is.

  • What is this thing? What patterns are attributed to it?
  • What is not this thing?
  • What about this thing makes it like or unlike other things?
  • What about what we understand about this thing makes us confident that another thing is like or unlike it?
  • What story can we tell about our understanding of this thing?

Most current algorithmic ML approaches cannot lead to machine cognition, to an understanding of things. The elemental properties of learning are fundamental to cognitive systems. For ‘machine learning’ to extend and evolve into cognition, we must first handle the essential processes of learning.

My hope and intent is for this writing to spark constructive conversations about these elemental aspects of machine learning and surface implementations to further this work.

Part 2: an implementation.

https://tinyurl.com/yaam8ep6