AI is a Revolution in What and How we KNOW

Yves Bergquist
8 min readAug 18, 2018

--

Most of what you read or watch about artificial intelligence, including from some very prominent field experts, is completely wrong. One of the most frustrating downsides of the decline of institutions and the Age of Individual Empowerment that we are going through is the incredibly low signal to noise ratio in any kind of expertise. The dominant voice on any topic is often just the loudest, or the savviest promoted, and not the most substantive. This is especially disappointing for topics as complex and critical as AI, where commentary ranges from silly and utterly misinformed fear-mongering (looking at you, Elon) to a more educated but extremely partial view of what it is and what it means. Lots of prominent AI researcher like Yann LeCun or Geoff Hinton, for example, are too all-in on scaling computational models up to multi-system and multi-model adaptive intelligence, which is only half of the work.

It’s especially unfortunate that computer scientists have earned somewhat of a monopoly on the field, making intelligence all about learning (which it is not) and that no prominent philosopher or mathematician (except my friend Ben Goertzel, who I sincerely believe history will remember as the true intellectual forefather of Artificial General Intelligence) has developed any kind of influential voice in the domain.

To fully grasp what artificial intelligence means, and start thinking about its impact on our society, it’s worth looking back to its philosophical First Principles.

Representations and the Human Brain

At a broad, philosophical level, artificial intelligence starts with the notion that, as powerful as it is, the human mind isn’t powerful enough to represent and understand the reality it lives in in its complete mathematical layout.

Representation is a key concept here (and “knowledge representation” is a critical -yet misunderstood- area of AI). There’s reality (let’s call it N, for the statistically minded) and the entirety of data about it. Humans and machines can’t measure all the position of all the atoms in the universe every single millisecond, so they have to summarize that reality. Humans can only look locally (around them) and in a very abstract manner (symbolic representation). As we’ll see, representational machine learning does almost the opposite (builds n from a lot of atomic-level data about N, as opposed to sampling N and creating symbolic representations of it). Even then, the complexity of what constitutes just my office, for example, or the physical state of my body, are both very high. So the human mind (and even machines) need to compress that reality into a simpler model, or represent N as a simpler statistical model that — hopefully- is a high fidelity representation (often it is not) or N. While reality is the function N, a representation is a compressed, sampled version of N called n.

A representation n can have high or low fidelity to N, just like the summary of a book can be a good, high fidelity representation of the book, or a poor representation of the book.

n, in this case, is our human knowledge (basically a bunch of statistical models) about itself and its environment. And that knowledge is a very compressed representation of reality, for which we never have a lot of feedback on, except when we make decisions from it. Sometimes that knowledge is good (and then the decisions we make based on it generate intended outcomes), and sometimes it’s bad (they generate detrimental or unexpected outcomes). “Good n” leads to good decisions, “Bad n” leads to poor decisions. This is important, because both at an individual and an organizational level, evolutionary “fit” (and success, performance in a competitive environment, etc.) is essentially a higher ratio of good to bad decisions.

Now, for many reasons linked to the structure of the mammalian brain and to how Western education has been shaped by engineers for centuries, we’ve been building mental representations of reality (building a bunch of n models) that are very mechanistic. Medicine is a great example. Look at how it’s organized by arbitrary functions, like the brain, the gut, hands and feet. Is this a high fidelity representation of the human body? Of course not. The body is a much more complex system, and we’ve been going way too long without representing and analyzing it that way.

Systems of Systems

The human body, society, the Universe itself, is a complex system of systems. Imagine a Russian nested-doll structure of graphs. A graph is a topological structure made of nodes (entities such as people, cars, atoms, molecules) and relationships between them (called edges). A molecule is a graph (atoms and relationships between them). A cell is a graph of that graph, where molecules become nodes in a higher level graph. That graph, in turn, becomes a node in a higher level, more complex graph that can be an organ or a muscle. That graph of graphs then is a node in a higher level graph that is the human body, which itself is a node in higher level nested graphs of family, community, society, country, mankind, etc. You get the sense of how complex this is: it’s mind-bogglingly complex (remember all of these entities and the relationships between them are constantly in flux), and computing all of these relationships together to represent and predict the state of even any of these sub-systems is, for now, virtually impossible (that’s the whole challenge of AI).

This data structure is called a “hypergraph” (a graph or graphs), and it’s perhaps one of the most profound and fundamental concepts in mathematics. Because it is a direct reflection of how the Universe is organized, a hypergraph is by far the most high-fidelity type of knowledge representation, and one of the most promising areas of research in AI (as you can imagine, it’s incredibly hard to compute such a complex graph on large datasets).

To create a much higher fidelity representation of our reality, humans should be thinking in systems (N). But because it’s far too complex for us to do so (the amount of computation is completely insane), we’re thinking very mechanistically trying to build a very compressed representation of N as a bunch of simple systems interacting together in a very linear fashion (still a graph, but a very rough and inadequate one), vs a more granular look at how certain cluster of nodes in a system might impact other cluster of nodes in another.

Enter AI

In 1992, an IBM engineer named Gerry Tesauro created a backgammon-playing machine learning application (based on a shallow neural network) called “Neurogammon”. The application would play at and even above human level, but in a way that didn’t quite make sense to even the best players in the world (see Gerry’s paper here). Fast forward to 2016: DeepMind (owned by Google) extended the work of Tesauro -and many others- to create a Go-playing application that performed above the level of even the best human players at the time. Here again, AlphaGo played the game in ways that often didn’t resonate with experts (a really good blog post on this is here). More recently, an OpenAI team created a very interesting hierarchical neural net application, called “OpenAI Five”, which beat some of the best players of Dota 2 at a very simplified version of the game. Once again, the application played the game in ways that were very unusual to human players.

You get the idea. In each of these examples, AI applications (thanks to clever architectures and gigantic amounts of computation, which many AI purists take issue with) we’re able to build vastly more complex representations of the system of these games than even the best human minds could. All of these applications performed what’s called “representational learning”, which is a type of machine learning (of which neural networks are a part of) where machines build their own hierarchical representations of the “system”, which is made of all the complex patterns in the data they are fed.

Representation learning is impressive specifically because it takes the entirety of the data about a system and learns to build compressed representations of it. It build n from N. This isn’t how the human mind works. Our brain has evolved over millions of years to promote efficiency, which means we don’t have the ability to be fed all the possible information about a system (except relatively simple ones) to create an optimal n representation of that system.

Instead, we have to strategically sample the data and build the best possible n representation of N, which we’ll iterate probabilistically through trial and error, and pass on these optimized models to future generations (new brains) so that — hopefully- the learning that happens in a machine in a few hours can happen over a century or more. So the human brain is not great a knowing in and of itself over a single lifetime, because it’s built to optimize its representations of the system of systems it lives in a collective manner (transfer learning) and over generations of agents and thousands of years. Sure, it’s worked pretty well so far, but it’s very slow. And for any of us, it doesn’t feel great knowing that we’re just a link in a long chain of evolutionary algorithmic optimization. Or to borrow an engineering term, just one “sprint” in a much larger development project.

A Revolution in Knowledge

At the most fundamental and philosophical level, AI promises to considerably accelerate this process by having powerful and -hopefully- highly intelligent machines to derive the best n from N. Armed with large datasets and such a capacity to create high-fidelity representations of complex systems, humans could become exponentially more knowledgeable about themselves and their environment.

What happens when were able to think about complex phenomena like cancer, economic development, or human performance, with hundreds of thousands of variables instead of a handful, which is how human knowledge has been developed? We’re suddenly able to create much more accurate and high-fidelity representations of these complex systems, and we’re able to develop better knowledge of them.

It’s a lot more complicated, of course. This hybrid man-machine superintelligence presupposes the existence and availability of large, properly curated, and accurate datasets (the machine learning community is awakening to how hard that is), organizations and societal normative frameworks that are ready to accept it (a whole new bag of scorpions right here), and a radically new mindset: systems thinking.

Equally as important is a giant technical step, which my team and I focus a lot of our efforts on, which is the ability to semantically and probabilistically represent all of this data across formats and ontologies.

As seen previously, representational learning is very powerful at creating high fidelity representations of whole complex systems. But that only goes so far, for several reasons: (1) it needs large and well curated datasets, (2) it needs lots of experts to tune, and (3) while it can represent very complex systems (with a large state space, like Go) it isn’t capable of scaling beyond bound, stable (they don’t evolve) and fairly coherent (from a data diversity standpoint) systems, where most systems we are trying to better understand (like the human brain, cancer, or human performance) are multi-dimensional, multi-ontological (they involve different types of data), and fast-evolving.

So while neural networks have shown impressive results in representing fairly complex systems (far above human abilities), they have not yet been able to cross into a higher level field of systems representation, which would make them very useful in helping humans understand more complex systems such as disease.

This step is where my team and I, both at ETC and Corto, have focused our efforts. We are working on extending hypergraph-like probabilistic knowledge representation applications to allow for more complex systems representations across large and heterogeneous datasets in a way that could make them more reliable, more autonomous, and more easy to compute.

After all -and we don’t hear this enough- the AI research community has for long known how to build supintelligence (probabilistic graph representation + information theoretic methods+ Bayesian/Reinforcement learning on complex adaptive systems), it just hasn’t been able to do so in a way that can be computed. 100% of the debate on AI today can be understood this way. It’s worth reminding that, a few years ago, AI researcher Marcus Hutter wrote a complete and definitive Artificial General Intelligence application made of just 50 lines of LISP code … but which, to be implemented, would need more computational power than exists in the whole Universe.

--

--

Yves Bergquist

Co-Founder & CEO of AI Startup Corto. Director of AI & Blockchain @ USC’s Entertainment Technology Center. Member/Researcher, DSL@Columbia University