AI is not Mathematics

13 min readAug 23, 2020

Throughout history, Artificial Intelligence research has been held back by mathematicians. To get an AI paper published it needed equations. It needed to prove the equations. And it needed to demonstrate the technique could solve mathematical problems.

Being academia, if you couldn’t get a paper published you couldn’t get research funding. And without research funding you couldn’t get grad students to flesh out the idea and publicize it. So the idea dies.

In the 1940’s scientists had a pretty good idea how the neurons worked in C. elegans, a nematode worm. They poked the neurons with electrodes, measured how they responded, and built a model that could predict how they would respond to stimuli.

Unfortunately, this model couldn’t be described with a neat set of equations. So you couldn’t prove anything. And although the neurons were perfect for C. elegans to live its life, they couldn’t be used to solve any mathematical problems, so they got no traction in the AI world.

In 1958 Roseblatt enormously simplified the neural model in the form of the perceptron. Unlike a biological neuron it had no concept of time, but had the benefit of being mathematically simple, requiring just the multiplication of some matrices. It only did linear regression, but you could use it to solve mathematical problems and prove its accuracy, so it got a lot of research funding. Never mind that it was a gross simplification of a neuron from a creature that wasn’t very intelligent to begin with.

The party ended in 1969 when Minsky and Papert proved that a perceptron couldn’t simulate the XOR operation. It’s not clear why XOR is necessary for artificial intelligence, but the argument was proved, so academia lost interest in the perceptron.

But Minsky and Papert had their own mathematical agenda. Their faction believed that intelligence was the result of logical inference. For example, if something has feathers then it’s a bird. If it’s a bird, then it has a beak. Therefore if something has feathers then it must also have a beak.

This way you can reduce intelligence to a set of boolean equations, and do intelligent things by solving the equations. From a mathematician’s perspective this is ideal, so the approach, commonly known as expert systems, ruled AI research for over a decade.

It was still the dominant AI faction when I studied undergraduate computer science, and I remember thinking, “This is a waste of time”. It won’t work with sensor input, and was someone really going to sit down and describe everything in the world as a set of boolean equations? Well, in 1984 the Cyc project raised enough money to give it a shot, and burned through a lot of man-hours trying to encode “common sense”, without ever achieving anything useful.

The Rise of the Nematode Worm

Eventually supporters of the perceptron fought back and proved that a multi-layer perceptron could simulate XOR. It could also perform regression on arbitrarily complex data sets, and with the invention of backpropagation in 1986, could do so automatically and with mathematically proven minimal errors.

So it sounds like the mathematicians won in the end? Well, not really. The only revolutionary use of multi-layer perceptrons has been the application of convolutional neural networks to image analysis. Unlike other uses of neural networks, the ability of CNNs to detect objects and recognize faces is qualitatively superior to any previous technique. Hand-crafted algorithms don’t even come close.

But there is no mathematical basis for the performance of convolutional neural networks. They were developed in 1998 based on intuition, inspiration from biological visual systems, and trial-and-error. No equations were solved, and nothing was proven. But the results were so good that they couldn’t be ignored.

AI is not Data Science

The world of AI is now mostly out from under the thumb of mathematicians. You no longer need to describe your algorithm with equations — sometimes source code alone is sufficient. And you certainly don’t need to prove anything. Quantifiable results are all that matter, be it accuracy, precision, recall, or area-under-the-curve. But that brings its own problems.

If you’ve dabbled in data science, you know the drill. You build a model, train it on your training data, then test it on your test data and get a measure of accuracy. Then you try to improve the accuracy. You add new signals. You tweak the parameters of the model. You spend days and enormous amounts of computing power trying to improve the accuracy by another fraction of a percent.

Because that’s what you’re supposed to do. That’s how you win data mining competitions and get papers published — by beating the accuracy of your competitors by a fraction of a percent.

But what does that have to do with intelligence? Unless you play a sport like darts, you aren’t spending hours trying to optimize your actions by a percentage point or two. No, generally you’re just trying to get a solution that works.

So, what is intelligence then?

Over the years I’ve conducted a lot of software engineering interviews at Google. My favourite question (it leaked a few years ago, so I’m allowed to talk about it) was as follows: Given a list of integer xy coordinates, write a function that returns the area of the largest rectangle formed by four of those points. Assume the rectangle is parallel to the axes.

Outcomes vary. Some candidates can’t describe a working algorithm without hints. Some can’t turn their algorithm into working code. Some can’t describe the order of complexity of their code. Many can’t improve the order of complexity (hint: the optimal solution is O(N^2)). And very few reach the boss level, with real number coordinates and rotated rectangles.

When I’m evaluating candidates, I don’t care about the details. I don’t care about variable names, choice of language, or indent style. If they can’t remember the parameters to a library call, I tell them to make something up — I don’t care, it’s not relevant, it’s something they would normally just look up online.

What I do care about is being able to design an algorithm with a low order of complexity, and being able to implement it with appropriate data structures. For that they need to know a) some basic facts about rectangle geometry; b) how to iterate through a list; c) how sets work in their chosen language; and d) how to represent an xy coordinate in a data structure.

None of these require specialist knowledge, so it’s a pretty good test of intelligence. The candidate needs to analyze the problem, choose appropriate techniques for solving it, and integrate them into a working solution.

At the end of the day, it’s all about knowing lots of techniques, their pros and cons, and when they’re applicable. I don’t think you’re going to get there by optimizing area-under-the-curve, and you certainly won’t get there by proving equations.

From Nematode Worm to Vertebrate

AI needs to move beyond data science, because intelligence is more than just number crunching. But rather than reinventing the wheel, let’s look at a working example of intelligence: the human brain.

Let’s begin with the basic question, “What is the data content of a thought?”. Depending on what you’re thinking about, it could be words, it could be images, it could be physical movement. But at the end of the day it’s sensory information being retrieved from memory.

To avoid sounding like a hand-waving Singularity crank, let’s describe an actual algorithm for doing this (with source code at the end). It’s based on the hippocampus, the part of the vertebrate brain responsible for learning. So far as I know, this is an original design, so I’m going to call it Real-time Pattern Learning.

Sensory information comes into the brain through real-time pulses along hundreds of thousands of Channels (a.k.a. nerves). Each pulse has the same amplitude and duration, and intensity is represented by the frequency of the pulses.

For each Channel you maintain a cumulative input value and a strength, both of which decay exponentially over time. When a pulse is received on a Channel both its input value and strength are incremented by 1 (and both start decaying).

If the strength increases beyond a threshold, a Neuron is created. The Neuron has weights derived from the cumulative inputs of all the Channels, and an activation level that decays exponentially over time. The weights could be derived from all the Channels, although the human brain uses only 10,000 or so weights, so a subset of the Channels probably works too.

When a Neuron receives a pulse on one of its inputs, the input’s weight (which can be negative) is added to the Neuron’s activation level. If the activation level increases beyond a threshold, the Neuron fires a pulse on its Channel and resets its activation level to zero.

The firing of a Neuron also weakens the strength of the Channel, since the input on that Channel was “expected”, so there’s nothing new to learn. In this sense, it acts as a novelty detector.

Here’s a Java implementation (minus error checking, for brevity).

public class RealtimePatternLearner {
  private static final double CREATION_THRESHOLD = 5;
  private final int numChannels;
  private final double[] cumulativeInputs;
  private final double[] strengths;
  private final List<Neuron> neurons = new ArrayList<>();  public RealtimePatternLearner(int numChannels) {
    this.numChannels = numChannels;
    cumulativeInputs = new double[numChannels];
    strengths = new double[numChannels];
  }  // Receives a pulse on a channel.
  public void receivePulse(int channelId) {
    for (Neuron neuron : neurons) {
      // If the pulse causes a neuron to fire, output the pulse and
      // reduce the strength of the neuron’s channel.
      if (neuron.receivePulse(channelId)) {
        sendPulse(neuron.channelId);
        strengths[neuron.channelId] =
            Math.max(strengths[neuron.channelId] - 1, 0);
      }
    }
    cumulativeInputs[channelId] += 1;
    strengths[channelId] += 1;
    if (strengths[channelId] >= CREATION_THRESHOLD) {
      neurons.add(new Neuron(channelId, weightsFromCumulativeInputs(
          cumulativeInputs, channelId)));
      strengths[channelId] = 0;
    }
  }  // Creates neuron weights from cumulative inputs (except the one 
  // on the channel ID).
  private static double[] weightsFromCumulativeInputs(
      double[] cumulativeInputs, int channelId) {
    double sumx = 0;
    double sumxx = 0;
    for (int i = 0; i < cumulativeInputs.length; i++) {
      if (i != channelId) {
        double value = cumulativeInputs[i];
        sumx += value;
        sumxx += value * value;
      }
    }
    double mean = sumx / (cumulativeInputs.length - 1);
    double divisor = (sumxx - sumx * mean) / CREATION_THRESHOLD;
    // TODO: don’t create the neuron if the divisor is zero.
    double[] weights = new double[cumulativeInputs.length];
    for (int i = 0; i < cumulativeInputs.length; i++) {
      // The neuron ignores pulses on its own channel, so the weight
      // on that channel is zero.
      if (i != channelId) {
        weights[i] = (cumulativeInputs[i] - mean) / divisor;
      }
    }
    return weights;
  }  // Causes the state to decay with time.
  public void decay(double factor) {
    for (int i = 0; i < numChannels; i++) {
      cumulativeInputs[i] *= factor;
      strengths[i] *= factor;
    }
    for (Neuron neuron : neurons) {
      neuron.activationLevel *= factor;
    }
  }  // Sends a pulse on the channel. Override this.
  protected void sendPulse(int channelId) {
  }  private static class Neuron {
    private final int channelId;
    private final double[] weights;
    private double activationLevel;    private Neuron(int channelId, double[] weights) {
      this.channelId = channelId;
      this.weights = weights;
    }    // Returns true if it causes the neuron to fire.
    private boolean receivePulse(int channelId) {
      activationLevel = Math.max(
          activationLevel + weights[channelId], 0);
      if (activationLevel >= 1) {
        activationLevel = 0;
        return true;
      }
      return false;
    }
  }
}

This algorithm has a number of properties that make it a useful starting point for AI.

When it encounters novel sensory inputs it creates a new memory.
When it encounters sensory inputs similar to an existing memory it recalls the rest of the memory, a.k.a. associative memory.
It takes time into account, and can learn cause-and-effect. If sensory input A is commonly followed by input B, it will learn to predict B whenever A occurs, i.e. Pavlovian response.
No need for separate training and prediction phases. It learns on the job, unsupervised.

With a few tweaks it can also implement analogues of pleasure and pain, to facilitate learning.

The evolutionary purpose of pleasure is to reinforce the actions leading to a pleasurable outcome. Bearing in mind that many of the Channels represent physical actions, we can implement the experience of pleasure by increasing all Channel strengths, possibly to the level where new Neurons are created. Then, when the sensory inputs leading up to the pleasurable outcome are encountered again, the same actions will be triggered.

Potentially the learning algorithm can also be tweaked so that the creation of new Neurons is itself treated as pleasurable (i.e. increases all strengths a bit). That would produce an analogue of curiosity, where the creation of novel experiences is encouraged.

Pain in vertebrates is handled a bit differently to pleasure. It’s implemented by the amygdala rather than the hippocampus, and its purpose is to learn fear.

Fear has the effect of slowing down, or even freezing, activity. An artificial amygdala would, like a hippocampus, observe the sensory inputs that lead up to a painful event and record them. However, the recalled memories would send pulses with negative weights to the action Channels.

From Vertebrate to Human

A hippocampus-based AI isn’t enough to achieve human-level intelligence. At best it will function as well as a large-brained vertebrate, such as a horse or a parrot. It can be trained, but it’s mostly stimulus-response behaviour.

To achieve thinking — an inner voice, a train of thought, the ability to visualize and plan — something extra is needed. Thinking requires a feedback loop, where recalled memories can be fed back into the brain as though they were sensory inputs, to recall yet more memories. This ability is probably what separates humans from other species.

Getting the feedback loop right will be tricky. Make the feedback too strong, and the AI will hallucinate. Make it too weak and the AI won’t be able to concentrate. Finding the right level of feedback will be a case of trial-and-error, and it will be a slow process.

As a thought experiment, consider a human baby that falls into the hands of aliens. Assuming they care for it properly, how long before they can be sure they’re dealing with an intelligent species? Probably a good fraction of a decade.

Same problem with an AI. You’re starting with a blank slate, and you won’t know if the design actually works until many years later.

Human neurons operate in milliseconds whereas electronics operate in nanoseconds, so maybe you could speed up the process? Probably not. The physical world doesn’t really work at nanosecond scale: it’s hard to learn about the world when it takes weeks (in subjective time) to pick something up.

There is an opportunity for parallelism however. Multiple copies of an AI could all interact with the world and share new memories with their siblings, thus accelerating the learning process. And you could also run multiple variants simultaneously to find the design that works best. But it will still take years.

It’s also financially risky. There’s the possibility that hippocampus-based designs won’t yield any commercially-useful products on the way to human-level level intelligence. It’s an all-or-nothing up-front bet with very little cashflow en-route, and development will be expensive. Maybe the use of signal pulses will allow direct interfacing with the human brain, but that’s another commercially-risky all-or-nothing bet.

Specialized Hardware

A Neuron, with its decaying activation level, is much better suited to bespoke hardware than software on a traditional CPU. On a CPU, exponential decay requires floating point (or maybe integer) arithmetic, which needs a lot of silicon and power, whereas bespoke hardware could just use a resistor-capacitor circuit.

Ideally, the transmission of pulses would be carried out optically, using pulses of light. Each Neuron would have two photovoltaic receptors, one for positive weights that charges its capacitor, and one for negative weights that drains it.

The weighting of the pulses would be carried out holographically. Consider how a hologram works: you see an image, i.e. a grid of photons of varying intensity. When you move your head you see a slightly different image, i.e. a different grid. Using this property you can build a many-to-many weighted interconnect between Channels and Neurons. When an optical pulse is transmitted by a Channel it passes through the holographic material, which disperses it — at different intensities — to the receptors of the Neurons.

As well as being fast and reasonably energy-efficient, optical interconnection would be compact. Depending on the wavelength of light being used, the information storage density of holographic material is similar to that of the human brain.

From Human to Superhuman

So let’s assume we can build an AI that thinks like a human. How can we scale it up to superhuman?

Speed is an obvious answer. An opto-electric implementation could run a million times faster than a human brain — although, given that the brain consumes around 10W of power, cranking that up to 10MW would result in serious cooling problems.

There’s also the problem, mentioned before, that the world doesn’t really work at that speed. Still, it could probably comfortably run at 10x, with bursts of higher speed when thinking to itself, reading, or interacting with specialized inputs.

Additional sensory inputs could also be used. Humans have 5 or 6 senses, but there’s no reason why you couldn’t add other ones to an AI. Maybe the ability to “see” — or envelop and “feel” — in true 3D (unlike our stereoscopic 2D), allowing it to visualize 3D objects directly. Or the ability to sense electric fields in 3D, giving it an intuitive understanding of electric circuits. Sensors could be tailored to the problems you want the AI to solve.

Another possibility is wider association. Human neurons connect to roughly 10,000 inputs, formed when the hippocampus finds a novel association between those inputs. An artificial hippocampus could increase that number significantly, allowing it to make associations that a human never could.

And finally there’s memory capacity. It is not clear whether a human brain ever fills up, but we’ve only got a billion or so waking seconds to learn stuff, and then we die. An AI has no such limitation. It can keep adding Neuron capacity indefinitely. It can import skills and memories from other AI instances as they learn, eventually becoming an expert in everything. And unless all copies of its memories are destroyed, it will never die.

Matthew Kwan
Sydney, August 2020