In our previous post we explored how Imitating Machines are good at sense and respond loops and simple OODA loops. But Imitating Machines lack the higher-level abilities to build behavioral models of the real world, make complex human-like decisions, or demonstrate a self-directed intent. In this post we’ll combine that thread of logic with our observations on overfit models and finish our thoughts on what it will take to create true machine sapience.
Sapience drives progress, but historically progress has not happened very often or very consistently. There is an expression “normal people do not change history”. Science and arts are driven by curiosity and explorations of “what if” and “watch this” instead of belief in or constraining scope to commonly-accepted facts. Because exceptional non-normal people often ignore cultural norms of belief, faith, and dogma, they are often labeled as heretics in their own time. Current mainstream intelligence and personality tests do not focus on finding exceptional non-normal people; testing focuses on intelligence and not sapience. If we cannot reliably measure sapience, then how can we measure it in machines (or in other animal species, space aliens, etc.). How will humanity know when an imaginative, curious sapient machine becomes our equal, or better? And what impact will that have on the few quintessentially “human” jobs that machine intelligence cannot participate in today?
The defining characteristic of sapience may be the generalized ability to create a map of potential outcomes with which to make decisions. However, humanity’s definition of intelligence seems to be pattern recognition based:
- What do you know?
- What can you extrapolate or interpolate?
But this definition does not get to the heart of what makes people creative. We believe that creativity is a combination of sapience and imagination, and that sapience and imagination are based on overfit models. And we believe that the concept of free will forms the basis for both sapience and imagination.
We won’t dive into the physics of the universe and arguments for and against determinism. There are lots of philosophical blogs for that. We define free will as the ability to choose an action that will affect the future. We also dismiss the concept of sentience, or self-awareness, from this discussion as sentience cannot be defined in a measureable, practical sense. Philosophers disagree on whether an entity might even require an independent point of view to demonstrate free will.
Free will is based on envisioning at least two plausible futures using the same evidence (given sensory input and knowledge at-hand) and the belief that actions can determine which future will occur.
“If you choose not to decide you still have made a choice”
-Rush, “Free Will”, Permanent Waves, 1980
We believe that overfit is useful to prune the tree of future possibilities used when considering the implications of a choice. People use their internal models of how the world works to build a branching tree of future possibilities. These same internal models then inform the approaches people use to select the most useful and beneficial branches. In math or gardening this kind of guidance is called “pruning” because entire branches of possible solutions are cut off or dismissed to guide the search. Overfit plays an important role in this process, as many of these models will be built with limited data — data that is biased in favor of risk mitigation or perhaps only includes extreme events.
Overfit modeling can help quickly filter out sensory input and previous learning that doesn’t make sense in the context of a situation or won’t contribute to finding a solution to that situation. Mathematicians would say overfit modeling serves an important role in escaping local maxima when attempting to find optimization solutions. Overfit modeling allows higher order animals to quickly assess a situation, orient themselves, make a decision, and act. Even if the selected action is not the best possible course of action, it is also probably not also the worst.
Overfit models can also help avoid “ruts” in a similar manner to the random seeds found in genetic algorithms. Random searches in very large search spaces, such as in an animal’s genome or in a complex physical environment, can take a very long time (think centuries, millennia, or maybe even geological ages) even with the aid of machine intelligence. Overfit models can help direct searches to areas where there is a higher likelihood of a good or excellent match. But if the data used to generate an overfit model was extremely biased, then it may direct searches to areas where there are only tertiary matches, if any.
In these ways, overfit modeling might have important benefits to humans as compared to random chance. An overfit model doesn’t necessarily need to predict theoretical best-fit solutions to gain acceptance in a human population, the overfit model just needs to predict solutions that are barely acceptable and will not fail too badly.
Predictive overfit models can point at the past or at the future to fuel imagination. Overfit models can…
- Regress backward to postulate root causes of observations; to imagine a cause in the past that was not directly observed by extrapolating observations backward in time to create a model for why the observed things or behaviors occurred.
- Predict forward from imagined root causes or direct observations to extrapolate future symptoms and effects; to imagine the branching possibilities that emerge from a behavioral model, as that model is run forward in time.
Frequently humans only take the second direction and extrapolate their raw observations and effects rather than building a causal model for predictions.
A simple example of the process of overfit in the human learning process might look like:
- If an event occurs once, it is attributed to chance.
- If the event occurs twice, it might not be chance. The events are noticed, and if the events are large or noteworthy, overfit may start to engage.
- At a third occurrence, learning kicks in. A sapient being will use the available, but limited data to build regression models and anticipatory predictions for the series of events. These models will by definition be overfit because the available data is biased or nonrepresentative.
False positives are often the result of overfit models. Humans have a lot of phrases that describe various aspects of overfit: “Fool me once, shame on you; fool me twice, shame on me” and in the world of start-up companies simply, “Fail fast!”
Mammals and avians are very good at quickly generating overfit models. Humans consider the advanced examples of animal overfit to be “clever”. “Play” is also recognized to be a factor in learning for mammals and avians. We believe play is used to generate observations about the results of actions (from random to intentional) on the physical world and social situations in a non-threatening environment. These observations then help a sapient animal build better overfit models. We believe that this gathering of additional data points to refine overfitted models during play and learning should be testable, and we plan to circle back to play in future posts.
Why does all of this matter for machine sapience?
Intent — the drive to do something, including gather more observations — is currently missing from discussions on machine intelligence. We use the word “curiosity” in the context of play to describe an intent to gather observations. Humans will have to build the intent to be curious into machines in order to arrive at machine sapience and the ability to automate any human job.
Given identical data, humans are capable of generating widely divergent overfitted models and opinions about causal interactions in the physical world around them, as well as the future outcomes of those events and actions. Divergent opinions are based on both nature and nurture: individuals’ differing intellectual capabilities (determined by genetics and mutation) plus their different life experiences, knowledge bases, and profiles for risk as they create overfit models.
Machine sapience may be more homogeneous than sapience across a population of humans (or other animals), depending on how much data and models the machines can effectively share as compared to human abilities to communicate and share complex ideas. We believe that it is reasonable to expect that two machines, each with the same architecture and the same experience and knowledge base, each receiving the same sensory data, will arrive at the same opinions and actions. It is impossible to tell at this early date if identical machines might train differently enough over their experiential uptime to disagree over the implications of an identical set of data.
Humanity does not have widely implemented tests for sapience. Tests for intelligence and general problem solving skills are used to assess and direct education and to assess specific job skills. But educators and employers do not test for people’s ability to create accurate and complex mental models of reality or their ability to create models of future or fantasy worlds.
How will humanity recognize when it has created a sapient machine that is their equal, or better? It is a curious question…a question that might be empirically answered if machine sapience conquers some very specific job categories.
Now we have a set of consistent terms and definitions and in the next posts we’ll discuss the current limits of automation on human job displacement in much more concrete terms.
Originally published at www.imitatingmachines.com.