Some Thoughts On Thinking

6 min readSep 24, 2017

One of the first problems you run into if you start thinking about building some form of AI is what data representation to use.

Normally when writing a piece of software, you do some “business modeling” where you say things like “we have Users, and they will add Shopping Items to their Cart, and then they will Check Out….” etc. It consists of nailing down, in very precise terms, what can exist in your system, and what actions can be taken on/with/by those things. Think cookbook recipe, a messy kitchen, and lots of pies.

But. Nature/God didn’t sit down however long ago and said “well, Humans will obviously eventually create the Internet, so let’s have a little database entry for that over here…”. While we may or may not start out as fully clean slates, there certainly wasn’t anything on the slate about Computers, ATMs, Baseball, Airplanes, etc. This isn’t limited to humans: dogs don’t know about balls or television before they’re born, yet they manage to adapt quite well to both.

There is something so fundamentally flexible about a mind that it, well, drumroll, boggles aforementioned mind. So if you’re going to create an actual AI, and not the impressive but extremely limited things we see plastered over the news on a regular basis these days, you’re going to have to have a good understanding of how this AI is actually going to think. And this will inevitably involve two things:

Data representation: “thoughts”, “concepts”.
Operations on that data: “reasoning”.

You can cheat a little here, so you’re allowed to say things like “we have five Senses, we have some Memory, we have Emotions”, but not “there are Houses, Computers, Bikes, …”. So however you’re going to encode these things had better be able to handle the completely unexpected and then be able to not only get used to it, but to embrace it and build upon it.

That’s a little tricky. It goes beyond “Data Structures and Algorithms 101”. It probably requires a completely different tool set.

So, let’s explore one possible direction, starting with a metaphor: the time domain vs frequency domain duality. When studying signals (EE anyone?), say the audio coming out of your headphones right now, then you can view that signal as a time sequence of values (samples), OR, you can view it as a spectrum of frequencies activating in varying patterns over time. The point here is that both views are in some sense complete — they can fully describe what’s going on, and any operation you take in one has a representation in the other — yet there are things that are super-easy to come up with in one and bloody impossible in the other. If you want to do anything with signals, you had better be intimately aware of both and be able to switch between them.

Conjecture #1: Computers/software as we now build them are like a time domain view of computing, and to build AI we will need to come up with the metaphorical frequency domain view. While they are equivalent in some deep mathematical sense, we will not be able to break the puzzle while we’re stuck in the metaphorical time domain.
Corollary #1.1: There are things that may be inherently difficult for an AI to do that are simple for a computer, and we may need to build a hybrid system to “get the best of both worlds”.

The traditional Turing Machine model of computing is a sequential machine with infinite memory that does one operation at a time.

The biology-inspired, or “human computing” view is that of a neural network: some form of “graph” in which information “flows” in a fundamentally parallel fashion.

Deep Learning has come in vogue, and one incarnation of this are so called Deep Neural Networks. Mathematically, these Deep Neural Networks and their variants can usually be expressed as a series of matrix multiplies with an element-wise max operation, with the input to each step being described as a vector:

Output_vector = max(0, A_matrix * input_vector)

The Deep Neural Network then has many of these operations in sequence, each one forming a “layer” (the “deep” part comes from having say 10 to 20+ layers).

Yes, sometimes the input/output vectors have a multi-dimensional interpretation and are best described as matrices themselves. That’s ok.

For notational convenience, let’s call the transformation matrices “reasoning matrices”, and the output vectors “thought vectors”.

If you look at these reasoning matrices and thought vectors, you’ll notice that they’re somewhat smooth — while they can change quickly, they still do so somewhat gradually.

Furthermore, it is plausible to interpret the thought vectors as the sum of sparsely encoded atoms of “meaning”.

This leads us to:

Conjecture #2: we can think of these matrices and vectors as discrete representations of idealized continuous entities. If our current “reasoning matrices” and “thought vectors” are but discrete and limited shadows on the cave’s wall, then their Platonic Forms are continuous in nature and theoretically capable of infinite resolution. But they’re still smooth, continuous, and somewhat super-imposable.

We can intuitively extend matrix multiplication to continuous matrices and vectors.

We can also intuitively change the resolution of a given discrete matrix/vector to a new, typically lower resolution matrix / vector by resampling them. Think of a matrix as a cat photo. You can zoom in and out of that photo, and that’s essentially the resampling we’re talking about here (yes, there’s some nasty bookkeeping in there, that’s ok).

Note that due to the nature of matrix multiplication, we can change the resolution of the input and the output independently (by resampling the columns and the rows independently). So we can decrease the resolution of the internal steps of a Deep Neural Network without changing the definition of the output vector.

Conjecture #3: The accuracy of a trained Deep Neural Network will degrade smoothly as we gradually lower the resolution of its internal reasoning matrices and thought vectors.

Recurrent Neural Networks are another type of network where part of the (intermediate) output is fed back into an earlier part. This gives the network a bit of “memory”, and this form has seen significant success in speech recognition and machine translation. At each “step” the next piece of sound or text is fed into the network, and it does its thing.

While I haven’t seen if the Recurrent Neural Networks’ output change smoothly over time, I’m feeling bold/foolish, so:

Conjecture #4: Just like the reasoning matrices and thought vectors can be viewed as discrete incarnations of continuous ideals, time can be viewed, and crucially resampled, the same way.

Now we’re finally ready to take the plunge — thought vectors are indeed the data representation we’re looking for:

Conjecture #5: Idealized thoughts can be described by a continuous sequence of continuous thought vectors. Depth and clarity of realizable thought is capped proportionally to the resolutions involved.
Conjecture #5.1: Thought vectors only have meaning within their context of origin, and need to be translated to e.g. English to be communicated. So while you may perceive and experience effectively the same thing when you see something that is of the color red as I do, our actual internal representations are probably different.

Psychology and experience tells us that we pay very selective attention to things: our perception and thoughts tend to be focused on one or two things at a time, to the exclusion of other things:

Conjecture #6: Attention can be viewed as selective, on-the-fly resampling of the internal reasoning matrices’ and thought vectors’ resolutions and update rates. Channels meriting more attention get more resolution and higher update rate, the rest get downsampled and run at a lower update rate.

This conjecture hints at a very adaptable processing machine. One that can operate within a capped power or CPU budget, while fluidly allocating cognitive resources to that which merits attention. It also matches subjective experience.

There are many pieces left to figure out before we can build such a thing. I do believe that “reasoning matrices” and “thought vectors” as described above will play a part in creating AI, but there still seems to be fundamental building blocks missing, never mind hooking them up into a working system.

Can we do it? Well, there are two types of intelligence (ignoring procreation):

Those that can at most build an intelligence on par with themselves.
Those that can build an intelligence greater than themselves.

Only #2 leads to the AI Singularity. It is not yet clear that we’re in that bucket. We may well be in #1. But building such an intelligence would still be Very Useful.

Some Thoughts On Thinking

Written by N.B. Cooper