The Black Magic and Alchemy of Deep Learning


The practice of Deep Learning is vastly outpacing theory. This is despite the incredible number of Deep Learning papers that are published every day on Arxiv. To develop good theoretical results, researchers have to settle with simplified models that are tractable with our current investigative tools. More advanced models that use the latest state-of-the-art techniques are at a level of complexity that are beyond our current mathematical toolbox to understand.

The practice therefore of Deep Learning, despite all the heavy math that is employed, is actually more like alchemy than that of chemistry. In other words, we don’t build solutions with much of a solid foundation that can give us good predictability on how effective the results may become.

Certainly, there are many rules of thumb (or Design Patterns) that we’ve learned through experience. This investigative intuition is learned by practitioners over time and you can find bits and pieces of ‘black magic’ that people have used to get better performance.

The down side of this magic is that many research results of state-of-the-art are indeed questionable. I don’t have the numbers, but I estimate that a majority of papers that are published on Arxiv that claim “state-of-the-art” results are indeed difficult to replicate due to (1) the lack of specifics on what magic (i.e. hyper-parameters etc) was used and (2) the lack of a released implementation that others can verify.

Deep Learning is at best an experimental science. Do not let all the mathematics fool you into believing that the theorists have a handle of what is going on. The truth of the matter is that we are continually caught by surprised as to what Deep Learning is capable of doing. Furthermore, in almost all cases, theorists have barely an explanation as to what is going on. This is the big unknown and the experimentalists are leading us into that frontier without a roadmap!

This is very different from our understanding of computer circuitry. Despite the complexity of software and hardware of these systems, we have a very precise understanding of how they work. Furthermore, we don’t expect software developers to understand the quantum mechanics of semiconductor transistors to be able to build stuff.

However, in stark contrast even understanding how linear algebra, activation functions and back-propagation works does not give us enough of an understanding how emergent behavior arises. The complexity scientists likely have better models. That’s not to say that Deep Learning researchers don’t know anything. There certainly a lot of good approximate theories out there that we employ to reason about what we are building. That experimental intuition is what is driving the outstanding research we are seeing today. I honestly think though that Deep Learning practitioners have a better understanding of how the brain works (despite not working with brains or using biologically cartoonish models) than the neuro-scientists.

Reputable science magazines have published recently articles that express this sentiment of how little we know about Deep Learning. MIT Technology Review published “The Dark Secret at the Heart of AI”, with this conclusion:

“Even if somebody can give you a reasonable-sounding explanation [for his or her actions], it probably is incomplete, and the same could very well be true for AI,” says Clune, of the University of Wyoming. “It might just be part of the nature of intelligence that only part of it is exposed to rational explanation. Some of it is just instinctual, or subconscious, or inscrutable.”

The article unfortunately conflates many ideas of the inscrutability of Deep Learning networks. Two things to make clear to the reader (1) We don’t know how Deep Learning works and (2) when it makes a prediction, we don’t have an explanation why it arrived at that prediction. That is just scratching the surface as to how little we understand! To make it worse, with the deluge of new experimental results from research, despite gaining some more understanding, we are discover more mechanisms that we don’t understand. In short, the acceleration of our understanding of Deep Learning is being surpassed by the accelerated discovery of new capabilities!

Fortunately, a few magicians have been brave enough to break the magicians code and reveal some of the magic and secret potions that is involved. Here we will describe a few valuable ‘tricks of the trade’.

Note to reader: I will clean this up when I have more time.

A lot of the documented magic is confined to the image processing space. However, trust me, when I say to you, that the magic that is used in the NLP space is even crazier!

For more on this, read “The Deep Learning Playbook