My Journey To Deep Learning Research

In this post, I would like to describe my journey to Deep Learning Research and the sources I used to accomplish this. This can be helpful for those who are just starting their way or asking themselves where and how to start getting around Deep Learning overall. I want to note that for me this journey took almost two years since I wanted to understand deep how and why it works for particular problems.


Academic and professional background:

I have done my bachelor degree in Applied Mathematics and Informatics and I had been developing web applications from 2010 to 2017.

In 2017 I was looking for something new (after Javascript fatigue) and I came across Andrew Ng’s Machine Learning course on Coursera which was one of the best courses of its time.

Mathematical Background

Even though I had a mathematical background from university I decided to refresh my math knowledge. Even if you don’t have a mathematical background, following these steps will make you more comfortable in understanding Deep Learning methods. This is pretty hard and long learning curve but it pays off later. After all, in my opinion, Deep Learning is an efficient use of linear algebra to optimize an objective function backed by calculus and probability theory.

Deep Learning is an efficient use of linear algebra to optimize an objective function backed by calculus and probability theory.

So these are essential for understanding Deep Learning deep:

  1. Linear Algebra
  2. Calculus
  3. Probability Theory

For the above-mentioned topics, I used MIT OPEN COURSEWARE. The best thing about MIT Scholar courses, they are very well prepared with video lectures, lecture notes, assignments, and exams. You should definitely try to do all the assignments and exercises. Exams served for me as checking my understanding of concepts. I always dreamed about studying in MIT so these courses provided me at least feeling to be taught by those professors.

1. Linear Algebra (~1.5 months)

For this, I highly recommend Linear Algebra MIT OCW Scholar Course by Professor Gilbert Strang. At the beginning of video lectures, it was very hard to follow because the content seemed very dry, I thought that I would never use those things Professor Gilbert Strang explained, I could not find any connections applicable to Deep Learning. But from the middle of the lectures, I realized that those dry seeming lectures were essential to building the ground for the upcoming complex ideas in Linear Algebra. Even though MIT provided with lecture notes I took notes on my own. Taking notes on my own was one of the best things I did, because now whenever I need to recap something that I do not understand I know exactly where I can find an answer for that.

Professor Gilbert Strang’s Linear Algebra course gives more than just theory, it drills linear algebraic concepts into your head.

2. Calculus (~3 months)

You need calculus if you want to understand optimization in deep learning, moreover, this will give you comfort when reading Deep Learning papers. Also with Calculus, you can understand why Early Stopping Strategy is equivalent to L2 regularization.

There are 2 parts for Calculus in MIT OCW: Single Variable Calculus by Prof. David Jerison and Multivariable Calculus by Prof. Denis Auroux. Both lectures are very good at explaining the complex concepts. Following these two lectures was just fun for me, I really enjoyed the lectures while grasping important concepts from the beginning. For Deep Learning probably you will not need Part 3 and 4 of Multivariable Calculus, but if time affords it is worth checking them too.

3. Probability Theory (~3 months)

For the probability theory, there are other great resources besides MIT OCW, but with the spirit of MIT OCW, I followed Probabilistic Systems Analysis and Applied Probability by Prof. John Tsitsiklis. This course gave me a solid background in Probability Theory, but for Deep Learning you might need a little more because of the high dimensional nature of data in Machine Learning.

Deep Learning

At this point, I had already spent ~7.5 months without coding any single line of code. You might say this is too hard to keep in direction, and yes it is, but the Twitter comes for rescue. Follow the great minds behind Deep Learning, it gives a fresh bit of motivation every day. Just seeing researchers training GAN’s is enough to make your doubts disappear.

After armed with Mathematical background now it is time to know where to get started with Deep Learning. There are plenty of courses online, free and paid. My recommendation is

Choose Deep Learning courses from academia because they will keep standards high compared to online courses where the content is organized to cover a large audience.

From MIT to Stanford

There are two amazing courses from Stanford:

  1. CS231n: Convolutional Neural Networks for Visual Recognition
  2. CS224n: Natural Language Processing with Deep Learning

CS231n

At first, I thought that I will need to know Computer Vision to understand this course, but the first 10 lectures are what you need for Deep Learning. Jupyter Notebook assignments are gems of this course. Also, this course provides with additional readings for derivatives and backpropagation which I found especially useful for vectorized implementations of neural network models. One strategy to follow this course will be to watch 3–4 videos and take 1 week to do only assignments.

CS224n

Unlike CS231n this course directly starts with an application of Deep Learning to Natural Language Processing. Mostly I liked theoretical assignments and midterms of the course which were not available for CS231n.

Deep Learning book by Ian Goodfellow

I wish I could have started with Ian Goodfellow’s Deep Learning book as early as possible because this is a great resource. It covers the mathematical background needed for the understanding, gives an introduction to machine learning concepts and more. This book dispels the fog in your head about Deep Learning. In my opinion, one should start with this book after either taking CS231n or CS22n, because it gets clear why some methods work and some do not, it gets clear why we need dropout and how it is equivalent to efficient ensembling. With one word it is AMAZING!

What’s next?

I am also at this step right now :) But I think there are two ways to go with, either apply acquired knowledge in industry or delve deeper in Deep Learning by applying for PhD. But one thing is certain, we should keep reading publications related to the field to stay in shape.

Where I am now?

Currently, I am doing my master degree in Computer Science and I am specializing in applying Deep Learning methods for Analysis of Medical Images, besides that, I am doing research at Fraunhofer IAIS where we apply Deep Learning for Natural Language Processing. So for me, this is just the beginning of the actual journey…