What separates us from AI, Part 2: Efficient Search is the Real Goal of AI/ML

Dev Nag
3 min readSep 3, 2017

--

In Part 1, we noted in passing that the F1 models are the ‘final output’ of most machine learning endeavors (however you get there). These are the tightly-guarded jewels that we spend days/weeks/months burning out our GPUs to create. F2 and F3 are means to that particular end.

But here’s the thing. We already know how to generate every possible F1 model there is.

All F1 models are computable functions — whether it’s a simple decision tree or a convolutional neural network with gigabytes of parameters. Which means we can actually define all of the potential models in a nice, ordered (countably infinite) list.

So the problem is not that we can’t generate the right F1 model. And the problem isn’t that we can’t find them, theoretically — the list can be ordered lexicographically, so we can just go up the list and we will reach the right model, eventually.

The problem is that we can’t find the right model efficiently.

(This assumes, of course, that we have a defined objective function and would efficiently recognize an acceptable solution if it were provided to us — which is not always the case!)

There have been many attempts at universal approaches to this problem, essentially F2 learners that will attempt to find the ‘best’ F1 model, given whatever criterion, efficiently given an F1 dataset:

  • Levin search / Universal search: Interleaves every possible program on a universal Turing machine until a solution is hit. Guaranteed to work…after you try literally every simpler program first. Led to the Optimal Ordered Problem Solver (OOPS) and other variants, like…
  • Godel machines: A self-rewriting machine which optimizes itself when it can prove the modification will increase the expected reward (aka, “Self-Help written by Logicians, not Dr. Phil.”).
  • Genetic Programming: Extending genetic algorithms to parse trees that look like LISP. John Koza spent a lot of his life writing a 4 book series on this topic and boy, did I spend a lot of my life trying to apply it.
  • Algorithmic Probability (AIXI-mc, AIXI-tl): All of the Solomonoff-prior-derived ideas fall into this class.
AIXI: A 1-line solution to sequential decision making (Marcus Hutter)

(What’s also curious is that many of these universal approaches can be linked back to a single cantankerous researcher in Switzerland named Jürgen Schmidhuber. OOPS, Godel Machines, AIXI/AIXI-mc/AIXI-tl/AIQ, and even many of the core advances in Deep Learning like LSTMs can be traced to him and his students. Which he is not shy about telling you. Hey, don’t hate the player, hate the game.)

The thing is — all of these approaches have been implemented in software, and tried on real data, and none of them have lived up to their theoretical promise. They just don’t work that well on realistically-sized data sets. In particular, they don’t seem to learn as well as human beings on a variety of tasks, even when they have orders of magnitude more data than humans could ever see.

There’s the rub. There’s no fundamental law of math or physics that blocks us from learning a language fluently by processing a few hundred million words — we humans are the existence proof that it’s possible, even commonplace. Yet no software has ever been able to do that.

So what’s the difference? What do humans do that AI/ML doesn’t? Why are we more efficient on certain data sets?

Was this worth reading? If so, please clap below.

Read Part 3 here.

--

--

Dev Nag

DevOps, AI/ML, Strategy. Founder/CEO @ CtrlStack. Prev. Founder/CTO @ Wavefront (funded by Sequoia). Former Google engineer. Stanford math. www.ctrlstack.com