Training an Architectural Classifier — III
Deep Neural Networks
This is part 3 of a 5 article series:
- Training an Architectural Classifier: Motivations
- Training an Architectural Classifier: Softmax Regression
- Training an Architectural Classifier: Deep Neural Networks
- Training an Architectural Classifier: Convolutional Networks
- Training an Architectural Classifier: Transfer Learning
In this article, I’ll be building a deep neural network in TensorFlow to attempt to beat the accuracy of the softmax classifier we built in the last article.
To recap the last post, we trained for 5000 epochs with softmax, reaching a max training accuracy of 75%, motivating us to search for a more complex model. Validation accuracy reached about 60% and only 55% on the test set, indicating overfitting will be a problem as well.
Intuition of Deep Neural Networks
The diagram at the top of the page illustrates the concept of a deep neural network. Each node of the input layer represents a pixel in an image to classify, and similar to logistic regression, each of those pixels are connected to a neuron in the next layer that is itself a logistic classifier. The difference now is that we have many neurons in that next layer, every pixel is connected to every neuron, and we duplicate this for multiple layers.
While the logistic model allowed pixel values to give evidence of class membership, this densly connected model allows neurons to find relationships between pixels that give evidence of membership. As more layers are stacked up, even deeper relations that have predictive power can be discovered. This process, known as ‘feature engineering’, is often done manually in other machine learning methods, but deep neural networks can do it on their own.
Mathematically, there isn’t anything terribly new going on in a deep network versus our last model, just more. We multiply our inputs by their respective weights, but now it’s done for each neuron.
Similarly, the output of that multiplication will be passed through a non-linear function for each neuron, this time using a ‘Rectified Linear Unit’ or ReLU instead of a softmax nonlinearity.
You can read more about ReLUs here, but it suffices to say that they are simply another non-linearity that have proven to be useful in deep neural networks. They can be interchanged with other functions like tanh or sigmoid freely.
The experiment notebook
This gave us a better test accuracy of 65%! And the tensorboard summary for 5000 epochs:
Looks like we’re getting much better on training accuracy, and a little better with validation, but we’ve got a much larger overfitting problem. This makes sense, as we’ve built a model that is much better at extracting data from the images it sees, but the images contain a huge amount of extraneous data, and we’re not showing it enough images for it to distinguish between generalizable data and extraneous data.
What we’ve found is known as the ‘curse of dimensionality’. As the number of input dimensions increases, you need exponentially more examples to prevent overfitting. An alternative approach to finding more data is to reduce the dimensionality of the input. Simply reducing image size is less than ideal, since you may be destroying valuable data, so we’ll turn to a technique called convolutions that will actually become a component of our network architecture.
Read more in my next article: “Convolutional Neural Networks”