To learn from something, we first have to understand it. With AI, this isn’t always so easy.
Our brain can learn just about any information, no matter how simple or complex. But what makes data “complex”?
563,490 is larger than 5, but about as simple: a integer on a numeric scale (a scalar, if you will).
How about a graph?
As it grows, a social network becomes arbitrarily complex. Adding more nodes isn’t too convoluted, but each node can have any number of connections (and some connections are weightier than others.
There’s no “hard” limit to how many people you can know — it’s arbitrary, uncertain.
Why does this matter for AI?
To feed data into a machine learning model, we have to convert the bits into a format the computer can work with — bits.
If you’re predicting temperatures based on daily readings, scalars are easy.
If you want a cat-recognizing convolutional neural network, you can convert each cat picture to a 3d array of scalars, one for each pixel’s RGB level.
But graphs? You could say that the dimensionality of the data increases, which is a quick recipe to crash any unassuming scikit model unprepared for such hardships.
Yet ML researchers have been hard at work for a good while now, working with graphs as input data. One way is looking at the Adjacency Matrix:
A sparse matrix can be constructed by filling “1” where nodes connect to other nodes. Note the diagonal symmetry — compression/simplification can be applied for storage efficiency.
Plenty of neural nets accept matrices as input data. But if the graph adds a node, the dimensionality increases, and your carefully constructed TensorFlow model will stop and complain about shape mismatches.
There are, of course, other solutions.
I enjoyed Michael Larionov’s detailed explanation of graph-input neural networks — graph convolutional layers, especially, have opened up new doors in modeling complicated graph networks.
I’ve been writing mostly about Hierarchical Temporal Memory networks these days, however, which is what brought me to the beautiful and vexing plane of graphs.
HTM nets are a new architecture of neural network modeled to more closely represent the arrangements of neurons in the neocortex. Specifically, the neocortical columns that seem to do most of our thinking. “Mimic the form, mimic the function”.
Our real neurons are binary — they fire, or they don’t — and interconnected enough to learn patterns with excitation and inhibition.
Instead of long, fully-connected layers like traditional neural networks, the computational machinery of HTM systems are interconnected at all levels.
The Temporal Memory, for example, is a 3D “block” of columns filled with neurons — akin to the nodes in fully connected NNs.
But neurons at the bottom “layer” can have connections to neurons in the top layer of other columns — not just the following layer.
The trick behind it all is the Sparse Distributed Representation, a binary array where ~2% of bits are 1.
Now back to the main issue: it’s not so tough to turn “42” into an SDR. Matt explains the concept of scalar encoding quite eloquently. Even words can be encoded, if you have a large enough corpus to teach the HTM word co-occurrence.
But once again: graphs.
A series of scalars isn’t too tough to deal with, so couldn’t we just use the graph’s adjacency matrix?
But what if the graph later adds more nodes — the dimensions increase. Arbitrary complexity is tough to convert to a fixed-size SDR.
I’ve read some great ideas for domain specific tasks, but a “general” graph encoder would take some intense research and wicked programming skill. In the search for a reliable graph encoding strategy, I decided to approach the question with HTM’s defining logic: question the biological roots of how we came to understand the data structure itself.
A chimp can look at an apple, and know that “this is indeed an apple”. Macaques hear the cry of a leopard and understand “existential threat”. Koko the gorilla learned a very functional degree of sign language.
But how do we understand graphs?
My theory is that we don’t — at least, not all at once.
I asked myself: Where do graphs occur in nature?
Trees, roots, certainly — but perhaps something else more innate, more crucial to being human?
As we evolved from early primates, proto-humans were social creatures. We eventually became complex enough to be called “tribal”.
Our ability to work together is what allowed us to come this far. Socialization built civilization.
So the most fundamental network — the proto-graph — is simply a social network. A family, band, tribe, village, any collective noun for a group of mutually interacting primates.
Think about your entire extended social network — conjure up an idea of all the people you “know” for about 15 seconds. Sort of difficult to picture the whole graph, right?
Now think of one of your friends. You can clearly visualize their face, your memories and feelings associated with them.
Now think of another friend.
There’s a good chance you thought of someone connected to the first person.
You subconsciously “primed” that thought with neuronal excitation and inhibition; the cells in your brain that know “friend_1” have plenty of synapses connected to “friend_2” because you know that both of them are connected in real life.
Your understanding of their relationship is reflected in your brain’s architecture.
By thinking of people you know and thinking of people they know (that you also know), you can traverse your social network graph like a mental pathfinding algorithm.
But what I keep noticing is that it’s terribly hard to ‘understand’ the entire graph all at once.
I reckon that we can’t picture our entire social network at once for the same reason we can’t simultaneously read every word on a page.
The task is too complex, and humans really aren’t great multitaskers. We need to focus on individual pieces to put together any larger puzzle.
You build your social network piece by piece as you live — continuous learning. HTMs also do this, and are similarly able to learn patterns between data, associating sequential inputs with each other like humans can.
Leading back to the search for a general graph → SDR encoder:
If humans don’t “learn” an entire graph all at once, why should an HTM system try to?
In my (quite limited) understanding of neuronal circuitry, graphs appear to be ‘meta-structures’ — interlinked “units” of knowledge, connected concepts or objects that are learned sequentially.
To understand a graph, your brain builds a graph with its own cells & synapses.
Mimic the form, mimic the function.
The Search Continues
I can’t help but wonder — is there a maximum “size” of knowledge, a set limit on how big an idea we can learn at any given moment?
When working through a hard task — learning a complex new concept, which in itself could be a hierarchy of interdependent elements — we learn pieces in rapid succession and do our best to put it together. To assemble a graph of knowledge.
In the case of machine learning, one Temporal Memory system might be overtasked by a complex graph.
The two solutions would then be to feed the one model each piece of the graph (similar to a graph convolutional layer scanning certain sections at a time) in order, or to assemble several HTMs connected horizontally. The latter method is similar to how many clusters of neurons across different cortical columns are involved in a complex task like object recognition.
At any rate, direct conversion of graphs to SDRs is a fantastic idea to chase. Much of the brain’s exact workings are still secrets, but there’s likely to be some strange, elegant meta-structural principles at work.