Deep Learning for the Non-Technical
The excitement behind Deep Learning is undeniable. It’s popular, it’s powerful, and now it’s even poking into popular culture. CEOs are asking data scientists what it means and how their company can benefit from it. Articles claim we may (or may not) be doomed by it (spoiler: we’re not). And, supposedly, soon enough we will all be able to sit back and drink a coffee while our cars drive themselves.
But there are also a lot of misconceptions about what Deep Learning is, how it works, and some of the assumptions it makes. So here is a brief overview about Deep Learning, aimed at a non-technical audience. Hopefully by the end of this short read you will understand the basics, and can make a more informed reading of all of headlines. Since it seems to be more common in industry, please note that this description focuses on “supervised” Deep Learning, which refers to when one explicitly provides examples of what you want the model to do.
What is Deep Learning?
Deep Learning is a technique for teaching computers how to do a task. When it’s “supervised” that means you provide examples of what you want the algorithm to do for your task. For instance, if you want it to classify images as dogs versus images that are not dogs, you would provide a large set of images, and for each image you would explicitly tell the algorithm if it is a dog or not. The machine learning happens when the computer learns from your examples how to differentiate images of dogs from not-dogs, all by itself.
Given that setup, one way to think about Deep Learning is to imagine a truck driving across the country, with lots of stuff in its back. It starts in some city, and it’s goal to reach the other city, all the way across the country. The truck goes from start to finish by selecting different cities to drive through along the way. For instance, as shown below, it could go from City 1 to City B and then the goal. Or it could pick City 3, City C and then the goal. All of these are valid paths from the start to the finish.
However, some routes (or “paths”) will be better than others. Let’s assume all paths take the same time, but some are really bumpy and some are smooth. Maybe the path from City 3 to City C to the goal is bumpy and we lose a lot of stuff on the way! But going from City 1 to City B is perfectly smooth, so we don’t lose anything. The figure below shows the boxes we lost depending on which cities we chose.
In this case, we obviously prefer to go to City 1 and then City B because we don’t want to lose our stuff! So, the goal of Deep Learning, then, is to assign weights to each little path from city to city so that trucks will find the paths where they lose the least stuff.
How does it work?
As we said, the goal is to assign weights to each path from city to city, keeping in mind that we prefer (emphasize) paths where we lose the least stuff. To do this, each time a truck drives from start to finish, we compute how much stuff it lost along the way. This is called the “loss function.” As we saw above, the path from City 3 to City C lost a lot of stuff, so that path had a large loss. But City 1 to City B didn’t lose anything, so its loss function was small.
Our Deep Learning algorithm learns by figuring out which paths “minimize the loss” (e.g., which paths our truck can take and not lose too much stuff), so that those will eventually be the preferred routes in the future.
The algorithm starts by randomly choosing paths through the cities. Each time it goes through a path, it computes how much of mistake that was (e.g., how much stuff it lost) using our loss function (number of boxes). If it were a costly mistake to take that path, the algorithm would want to adjust the weights lower so that other paths will be preferred next time. So, each time the truck goes from start to finish, we compute how much of a mistake it made, using our “loss function” that counts dropped boxes, which is called “forward propagation.”
Given that we have a way to say how costly each mistake is, we can then learn from them! Each little piece of the path (hop from city to city) contributed to our overall mistake. Maybe choosing City 3 in the beginning caused us to lose 2 boxes, and then choosing City C after that caused us to lose another (so we lost 3 boxes total, as shown above). Therefore we can assign “blame” to each city choice, and update our weight (how much we prefer that route through that city), according to how much we blamed each city. This is called “backward propagation.”
Therefore, the Deep Learning “training” process has our truck drive a path (forward propagation), assign blame (backward propagation), and then update the weights (so it prefers a different path next time). After the truck has driven path after path after path, and it has assigned enough blame to each city, the end result will be that certain paths through the cities will be preferred over others. And those preferred paths would be those that lead to the fewest mistakes.
This is how the Deep Learner learns! It computes the mistakes, using forward propagation, and then assigns blame for those make in backward propagation. This adjusts the weights so that over time we make fewer and fewer mistakes. In our dog example, we will eventually learn which pieces of the image correspond to “dog-like” parts of the image, since the pieces that are not “dog-like” will get blamed for making mistakes and then will not be emphasized in the next decision, while those that are “dog-like” would get preferred.
What’s So Deep about that?
Some of you might just say, “well that’s just a neural network,” and you would be right! The “Deep” part about Deep Learning is that instead of your network looking like this:
It looks more like this (and even this is small):
So you have lots of paths and cities you need to analyze and update, which means you need lots and lots of data to train on.
Why is it so popular?
The main reason Deep Learning is so popular is because it is so powerful. It appears that when you give Deep Learning algorithms more and more training data they seem to do better and better. This is in contrast to many “traditional” machine learning techniques which seem to provide only marginal improvements when given more and more data (after some initial amount where performance improves amazingly, of course). And since the volumes of data in the world are increasing, it’s also become easier to find enough training data to take advantage of. Finally, these techniques are computationally challenging, but as machine costs have gone down, they have become more accessible to more people.
Deep Learning is great. It’s fun to use and an almost magical experience to push a huge data set into a model and watch it learn to make decisions just like you would (that picture is a dog!). But at the end of the day, keep in mind that it’s just updating weights to a complicated function. So, now that you’ve peeked inside the box you know what’s really in there … no genie; as always, just math…