10 important papers to get started with machine learning

Every week new deep learning papers are published, which often build on previous papers (the normal path in science). Therefore, it is essential to get a good overview of some entry points to the different subtopics in Deep Learning. I will present ten papers, that were especially important in the last several years, and that will give you a good foundation for understanding more advanced papers. I presume, that you are already familiar with the basics of Multi-Layer-Perceptrons, Backpropagation and CNN’s.

This article is divided into the five topics: Training/Optimization, CNN/Object Detection, Generative Adversarial Nets, Natural Language Processing and Deep Reinforcement Learning

I think all of the papers provide good entry points for more advanced papers in the respective subtopic.


1. Dropout: A Simple Way to Prevent Neural Networks from Overfitting

Dropout is a method to improve the generalization of a DL-Model. It randomly drops out neurons together with its weights in every training-step. This can be seen as training an ensemble of models with shared weights and combining them afterwards. Several results show that the test-error can be strongly reduced by using Dropout.

2. Batch Normalization: Accelerating Deep Network Training by Reducing Internal Covariate Shift

In a Batch Normalization step, one normalizes every activation of a layer through the mean and standard-deviation of the activation in the whole batch. This leads to activations with values around zero, what in turn leads to bigger gradients while training and therefore conquer the vanishing-gradient problem.

3. Adam: A Method for Stochastic Optimization

The Adam optimizer is the most common optimizer used in Machine Learning. It is therefore useful to look under the hood. Especially the addition of the momentum term seems to improve the optimization compared to standard stochastic gradient descent.


4. Visualizing and Understanding Convolutional Networks

CNN’s or NN’s in general are (were) more or less black-boxes, in the sense that their parameters are not really interpretable. Therefore, it is necessary to better understand how they get to their decisions. This paper shows a possibility to visualize the features learned by the filters in different layers and therefore how a CNN comes to its decision for a specific class.

5. You Only Look Once: Unified, Real-Time Object Detection

When it comes to predicting Bounding-Boxes, the YOLO-model was a breakthrough, because, as the name suggests, it can detect several objects in a picture and assigning bounding boxes to them in only one single forward pass. This makes the object detection very fast and even usable in real-time video applications.

6. Fully Convolution Networks for Semantic Segmentation

Besides Bounding-Boxes, Semantic Segmentation is the second major type of Object Detection. Here FCNN’s showed some pretty good results. An FCNN consists just of convolution and deconvolution layers and dispenses with fully-connected layers. This allows the FCNN to map directly from local parts of a picture to the same local part of a Semantic Segmentation map.

Generative Adversarial Networks:

7. Generative Adversarial Nets

Compared to the other major tasks of NN’s and CNN’s, Generative Adversarial Nets don’t try to classify objects and classes from a high-dimensional space, but try to sample from a high-dimensional space. Two Networks are playing a two player game, in which the generator tries to generate samples from the distribution of some data and the discriminator tries to find out, if a sample came from the generator or from the real data. This leads the generator to successfully generate samples from the distribution.

8. Improved Techniques for Training GANS

GAN’s suffer from various problems when it comes to training. First there is mode-collaps, which means that the generator only produces one (or few different) sample(s) which though looks highly like real samples, but the generator don’t converge to the real distribution of the data. This problem is mainly conquered in this paper by openAi.

Natural Language Processing:

9. Efficient Estimation of Word Representations in Vector Space

When it comes to NLP with Deep Learning, a main subject is to represent words in a vector space. In the so called word2vec stetting presented in this paper a neural net tries to predict the next word in a sentence and derives from the first layer a vector representation of the word. Therefore, words that come up in a similar context in a sentence are nearer in the associated vector space.

Deep Reinforcement Learning:

10. Playing Atari with Deep Reinforcement Learning

This work was a major breakthrough in Reinforcement Learning, in which the Q-Function was represented as a Deep Neural Net. Google Deepmind achieved high performance in playing different Atari games with this architecture, what shows a great ability of generalisation. The paper also explains the general concept of Q-Learning and Deep-Q Learning.

I hope this overview could give you some useful insights and help you to get started with more advanced papers in Machine Learning.

For more blog post about machine learning and AI please visit neuralbreeze.com