Demystifying Deep Learning

Self-driving cars, voice activated home automation, text translation, automated style transfer: Applications for deep learning seem to be everywhere nowadays. It has proven to be a powerful, effective tool for solving very difficult problems. And, in this humble author’s opinion, we are just starting to scratch the surface of what we can do with the technology.

But wait a minute, what is deep learning? If you’re new to programming, or even if you’re an experienced programmer who is new to machine learning, you may be a little lost when it comes to deep learning. In fact, until recently, I felt a little lost myself.

I studied artificial intelligence and machine learning in graduate school (I had a great professor there, Charles Isbell). I also dabbled in some research and side projects, but my industry experience didn’t include a lot of applied machine learning. As of today, my grad school years were …gasp… almost 10 years ago, so I decided to level up and study the latest technologies in deep learning.

As part of my continued education, I attended my first deep learning hackathon. It was put on by Deepgram, a company that specializes in deep learning. It was a lot of fun! My team actually won for best speech or text project. Here we are below:

Our winning project was a speech to math application. The goal of the project was to help students with disabilities by allowing them to dictate math expressions. You can check out our presentations in our team’s github repo. If you’re curious to learn more, feel free to send me an email.

So, what have I been doing to stay current? And let’s not forget our original question: what is deep learning, anyway? Keep reading! I hope to give you the background and information you need to get started.

Machine Learning: A Foundation

Deep learning is sub-field of machine learning. So before we discuss deep learning, let’s talk about machine learning. Tom Mitchell, in his book Machine Learning, gives the following definition: A computer program is said to learn from experience E with respect to some class of tasks T and performance measure P if its performance at tasks in T, as measured by P, improves with experience E. Although this definition probably sounds oh so sweet to mathematicians (cough, Matt Lane, cough), it definitely doesn’t help when you’re first learning. So I’ll offer a more approachable definition. To summarize Tom Mitchell, machine learning is the process of using a set of data to build a model that you can then use to gain insight about your data or predict some outcome using the model.

With a working definition of machine learning, let’s get more specific. There are two main types of machine learning that we’ll discuss: supervised learning and unsupervised learning.

Supervised learning is the process of taking a set of data with labels, then building a model that predicts the labels for new data. Let’s take spam emails as an example. If you were trying to predict if an email belongs in the spam folder, you would need a large dataset of emails that have been labeled as “spam” or “not spam”. You would then look at the features of those emails. Features may include keywords in the subject, keywords in the text, the email sender, the server that the email was sent from, etc. The data set that consists of features of our data plus labels (in this case, “spam” or “not spam”) is known as the training set. Once we have our training set, we can use a variety of different machine learning algorithms in order to build a prediction model based on the training set, and then the model we have built can make predictions on new data that we do not have labels for. One machine learning algorithm that is fairly simple to visualize is the decision tree.

Unsupervised learning in some ways is the opposite of supervised learning. You are given a large data set, but you do not have labels for your data set. Ultimately what you are doing in unsupervised learning is trying to find interesting patterns in the data automatically. As an example of unsupervised learning, let’s say you have thousands of out of copyright ebooks, but sadly, no metadata about the books. You simply have the book text. Let’s say you want to find some interesting groups of books (maybe by style or genre). You could use a clustering algorithm that will group books with similar features. One such algorithm you may use is K-means clustering.

Now that we have an intuition for supervised learning versus unsupervised learning, let’s talk about neural networks. Neural networks are an important concept to understand for deep learning, so hang on, we’re almost there. A neural network is another type of machine learning algorithm that consists of nodes called neurons. A neuron takes, as input, one or more values along with a weight, w, and then computes an output. Below is an example of a simple neural network:

In the diagram above, each circle is a neuron and each arrow going to a neuron is a value, or signal. The output from one neuron is often the input for many other neurons in the next layer of the network. Not shown in the diagram is a value w which is a distinct weight for each input to the neuron. The values of each weight in the network are what our machine learning algorithm is learning.

Also notice that the neural network has layers. The first layer is our input layer, so the signal at that layer are the features of our data. The next layer is the hidden layer. The hidden layer takes outputs from the input layer multiplied by a weight for each value as input for the neuron in the hidden layer. There can be many hidden layers in our network. Finally, there is an output layer. The output layer should produce some signal that we are trying to learn. For example, the output layer could predict if something is spam or not spam if we are using our previous supervised learning example.

What is Deep Learning?

Now back to our original question. What is deep learning? A simple answer to your question is that deep learning is the process of building a neural network that has many hidden layers. If you recall our neural network image from above, the hidden layer is the group of neurons in the middle of the network. So a deep neural network is a neural network with many hidden layers between the input and the output layers.

Are neural networks a brand new, cutting edge topic? Not exactly. The concept for deep neural networks has existed since the 1980s. So why has deep learning become so popular lately if we have known about this stuff for so long? The answer is computing power. Deep neural networks were simply not possible in the 1980s with the computing resources available. One of the major breakthroughs was computer scientists taking advantage of your computer’s GPU (Graphics Processing Unit) to help compute the weights for each signal in a deep neural network.

Now, what is a GPU? If you are a gamer, you probably already know the answer to that one. It is a piece of hardware in your computer that is specially made for doing computations that help your computer render graphics. A good GPU will often make your computer games run more smoothly, but it turns out a GPU is also very useful for deep learning. To train a deep neural network, your computer must perform many mathematical computations. A GPU’s main job is to do many mathematical computations at once. Therefore, the major breakthrough that has made deep learning much more possible today is taking advantage of the computing in GPUs to help solve for the weights in a deep neural network.

Hopefully the concepts in deep learning are a little clearer now, but what if you want to do some of this stuff yourself? What should your next step be? Good question. Let’s talk about some resources.

Deep Learning Resources

First off, the language of choice for many deep learning frameworks is Python, so if you’re serious about the field, I’d recommend learning Python first. Not to worry though, Rithm School has you covered. We have free Python Fundamentals Part I and Python Fundamentals Part II online courses.

When you’re feeling comfortable in Python, the next step is to start diving into machine learning fundamentals. Andrew Ng, a machine learning professor at Stanford and a founder of Coursera, has a free Coursera machine learning course as well as a deep learning tutorial. If you’re new to machine learning, definitely start with the Coursera course before you check out the deep learning tutorial. There is also a very popular book simply called Deep Learning that you can tackle after completing the Andrew Ng resources. If you’re still looking for more, Scott Stephenson, the CEO of Deepgram, wrote a great blog post entitled How To Get A Job In Deep Learning offering some great advice, plus many more resources to check out.

During your deep learning journey, you may hear a lot about frameworks like PyTorch, Theano, or TensorFlow. All of these tools can help you write machine learning algorithms that take advantage of the GPU, but if you’re just getting started, I’d advise you to learn these frameworks later (after you feel comfortable with the main concepts and algorithms in deep learning).

You should absolutely be working on practical applications of these algorithms though. Instead of something like TensorFlow, try Kur. Kur is a declarative deep learning framework that was made and open sourced by Deepgram. It is easy to get started with Kur; however, Kur is still powerful enough to build any machine learning algorithm you have in mind. Think of Kur as a higher level descriptive tool that uses PyTorch, Theano, or TensorFlow under the hood.

Conclusion

I hope the concepts behind deep learning are clearer now than they were before. If you find yourself excited about working on machine learning problems, you’re in luck. The demand for machine learning engineers is projected to grow rapidly in the coming years. As researchers and companies continue to push the boundaries of what is possible in the field, the opportunities for engineers with these skills will be plentiful.