Deep learning: the truth behind the hype

Deep learning is today’s buzz. It’s a black-magic technology lauded as a cure-all for technical ails at the very least, and in the most extreme, a computational substrate for genuinely intelligent (if not conscious!) machines. Labelling your startup a ‘deep learning company’ is in vogue (never mind the fact that doing so is often as absurd as calling Domino’s Pizza an oven company), and it seems that every other big tech company is building a research group devoted to it.

Is the buzz merited? Will deep learning totally revolutionize the world? Will companies not investing into this technology be left in the dust of those aboard the hype train? Will deep learning automate all of our jobs?

Many would respond to all of these with an enthusiastic “yes!”, but I want to address the finer points of the matter, and instead deliver a much-needed ‘sort of…’ With this new clarity we can paint a picture of the road forward for machine intelligence.

Deep learning in two paragraphs:

Deep learning is the technique of using artificial neural networks that are many layers “deep” to perform computational tasks. Artificial neural networks are assemblies of simple computational elements assembled in a way loosely resembling human neural tissue. If these networks are carefully tuned, they can perform complex computational tasks. One particularly successful breed of artificial neural networks, Convolutional neural networks, attempt to mirror the structure of the mammalian visual cortex, a part of the brain responsible for object recognition. As one might expect, convolutional networks excel at tasks similar to object recognition.

For example, a simple convolutional neural network I built was capable of determining a person’s emotion given a picture of their facial expression with an accuracy within 1% of human performance, when it does make errors, the errors are similar to the ones humans make. Truly an amazing system!

But, there’s a catch.

Deep learning has major limitations.

Deep learning is exceptionally data hungry. To automate a task with a neural network, one must gather a massive dataset of what information the task requires, and what the task produces as output. The network is trained on this data, attempting to learn how to morph input into a proper output. This works very well, provided one can make the assumption that any future input to the network will be only a small jump away from the space of training data it has already explored. Given this condition it’s not hard to assume that the network will make a good decision.

For my emotion recognition system, the training dataset consisted of approximately 100,000 images of human faces, each one labelled with one of seven possible emotions. Emotion recognition is a relatively simple task with a very constrained system of outputs, and it still required nearly 100,000 examples to learn properly. Can you imagine if you were trying to build a conversational AI system, and you had to amass a dataset spanning all possible thoughts the system could have, labelled with all possible sentences the system should produce? It’s hardly a tractable problem.

For each problem one would like to automate with deep learning a large training set of data must be built. Most of the thousands of mundane tasks that people loathe and waste their time on day after day are relatively specific. Specific tasks require large amounts of task specific data to automate. Most specific, mundane tasks in a person’s day to day life aren’t repeated nearly enough to build a dataset big enough for deep learning to be effective. Data-hungriness prevents deep learning technologies from solving specific, but repetitive problems.

Deep learning systems can’t reveal how they make decisions. The decision making processes within a neural network are not interpretable in most neural networks. The neural network I developed for doing emotion recognition cannot tell you what makes a happy face different than an angry face, nor can it tell you why it decided a face was surprised and not angry. For simple tasks like emotion recognition this isn’t much of a problem, but it would be horrible if your deep learning powered AI doctor could not explain to you why it thought you had cancer. There’s work to fix this shortcoming, but little progress has been made. Currently, deep learning cannot make high risk decisions.

Where do we go from here?

Deep learning has issues, and cannot alone bring about the AI-fueled wonderland that many (myself included) dream of. What can be done now? What should be the focus of machine intelligence research?

Two houses, both alike in dignity…

The field of AI research is divided into two broad camps: the Symbolists and the Connectionists. They’ve remained divided due to staunchly drawn lines in the academic sand.

The Connectionists begot deep learning. They believe that the path to developing true computational intelligence is by building computational systems inspired by the neural circuitry of our brain.

The Symbolists were the prime movers of the field of AI. The paradigm of symbolic AI is to construct massive banks of knowledge and rules about how the world works, often structured as ontologies. Given rules and knowledge, it’s easy to build a system capable of reasoning over them. Symbolic AI systems can make decisions and answer questions deductively and inductively. Furthermore, the decisions and actions a symbolic AI system makes are interpretable! When a symbolic AI system spouts out an answer, one can simply follow the system’s chain of reasoning to figure out why it came to its conclusions. Exemplary work in symbolic AI includes the Cyc project, and the Genesis project.

Viewed through the lens of discovering the computational mechanism behind intelligence, the divide between symbolic AI and connectionism is easy to understand — the two camps are totally different ideologically. But when developing AI to a practical end, perhaps there’s an opportunity for romance between symbolic and connectionist AI.

Symbolic AI is interpretable, and only needs a set of rules and knowledge to make decisions, seemingly solving two major problems of AI based on deep learning. So why is deep learning all the rage, whereas only academics know of symbolic AI?

Symbolic AI systems are a disaster at learning. The rules and knowledge in a symbolic AI system have to come from somewhere. Often times, the common sense rules and knowledge in a symbolic AI system are hand coded. ‘Situation specific’ knowledge — say the knowledge extracted from a story an AI is trying to understand, is extracted using another set of predefined rules. As one might imagine, it’s intractably hard to hard-code a rule for everything. It’s even harder to define rules that tell a system how to learn new rules and knowledge. Imagine trying to write a set of logical rules that told an AI how to turn the language of a story it was reading into an ontology of all the knowledge and relationships in it — it’s dizzying difficult, and perhaps intractable.

Here’s one potential marriage of the two fields we could explore: it seems deep learning excels at recognizing patterns, and symbolic AI is great at making decisions once it has rules and knowledge. What if we used deep learning to build systems to recognize rules and knowledge in data such as raw text, and then given those rules, used symbolic AI to do reasoning? You’d only need to train a system to ‘read’ once, and it would thereafter be able to learn from all sorts of information at an incredibly rapid rate. What’s more, it would be capable of making decisions backed up by chains of reasoning, and the specific sources of it’s knowledge.

Another thought experiment: by leveraging a concert of symbolic AI and deep learning, you could create systems that build themselves out of tiny neural networks specialized for a very specific but generalizable subtasks. This doesn’t seem to be too far off from the way humans think. If someone hands me a picture, and commands me to ‘count all the pink elephants’ I’ll first look for elephants in the picture, identify the pink ones, and count those up.

The traditional deep learning approach to solving the problem of counting pink elephants would be to train a neural network on a dataset of hundreds of thousands of pictures of multiple elephants, painstakingly hand-labelled with the number of pink elephants in them. Needless to say, this might be a very hard dataset to come by, especially because pink elephants don’t actually exist. Instead, we could use symbolic methods to parse the sentence ‘count all the pink elephants in this picture’ into a few simple subtasks, and then assemble the necessary networks for those tasks — similar to how humans would think about this.

For example, we could break the request into the subtasks ‘select elephants’, ‘of the selected objects, only select pink objects’, and ‘count all selected objects’. For each subtask, you could train a neural network using a readily available dataset demonstrating the concept of the task: what the shape of an elephant looks like, what a pink thing is, and what counting is. With a large enough grab-bag, you could recycle neural modules and build highly complex networks to solve specific tasks without amassing any new dataset whatsoever. This significantly reduces the data hungriness problems of current pure deep learning systems. What’s more, the well defined architecture of this system would give us insight into how it carries out everything that it’s tasked with.

While the example of pink elephants is a little outrageous, it highlights something amazing. Modular networks would be able to reason about objects and scenarios they’ve never seen before, like pink elephants. It’s not hard to think of how you could use similar methods to count and describe tumors in x-ray images, or identify pedestrians in the field of view of a self-driving car, or read over all the newest medical literature to find novel treatments for lung cancer.

Certain research groups are already hybridizing deep learning with symbolic approaches to AI, often with great success.

A modular network system similar to the one described has already been implemented by Jacob Andreas et al. at Berkeley. Their system achieved state of the art results in question answering, with a high degree of interpretability.

Pedro Domingos’ group has made some significant strides in machine reading, often by leveraging a concert of deep and symbolic approaches to intelligence.

Joshua Tenenbaum et al’s system for visual continuation learning has shown unprecedented ability to learn grounded symbolic concepts of objects it “sees”. This represents major strides forward in improving the interpretability of deep models.

It seems to me that neither deep learning nor symbolic AI will bring about a world where humans can focus on human things rather than mundane, repetitive tasks. Hybridizing connectionism and symbolic approaches to AI presents a wealth of opportunities to move forward in pursuit of that goal.