Maximizing Convolutional Neural Network Accuracy Summary
Last Updated on February 1, 2020
Reference:
- Practical Deep Learning for Cloud, Mobile, and Edge: Real-World AI & Computer-Vision Projects Using Python, Keras & TensorFlow
- https://www.oreilly.com/online-learning/
We explore several questions that tend to come up during model training:
- Does reducing input image size have a significant effect on prediction results?
- What is the appropriate “learning rate” to supply during model training?
How Hyperparameters Affect Accuracy
We aim to modify various parameters of a deep learning pipeline one at a time and see its effect primarily on validation accuracy. Additionally, when relevant, we also observe its effect on the speed of training and time to reach the best accuracy (i.e., convergence).
Our experimentation setup is as follows:
- To reduce experimentation time, we have used a faster architecture — MobileNet.
- We reduced the input image resolution to 128 x 128 pixels to further speed up training. In general, we would recommend using a higher resolution (at least 224 x 224) for production systems.
- Learning rate is set to 0.001 with Adam optimizer.
Effect of Learning Rate
Experimental setup: Vary the learning rate between .1, .01, .001, and .0001
Here are the key takeaways:
- Too high of a learning rate, and the model might never converge.
- Too low a learning rate results in a long time taken to convergence.
- Striking the right balance is crucial in training quickly.
Effect of Optimizers
Experimental setup: Experiment with available optimizers including AdaDelta, AdaGrad, Adam, Gradient Descent, Momentum, and RMSProp
Here are the key takeaways:
- Adam is a great choice for faster convergence to high accuracy.
- RMSProp is usually better for RNN tasks.
Effect of Batch Size
Experimental setup: Vary batch sizes in powers of two
Here are the key takeaways:
- The higher the batch size, the more the instability in results from epoch to epoch, with bigger rises and drops. But the higher batch size also leads to more efficient GPU utilization, so faster speed per epoch.
- Too low a batch size slows the rise in accuracy.
- 16/32/64 are good to start batch sizes with.
Effect of Resizing
Experimental setup: Change image size to 128x128, 224x224
Here are the key takeaways:
- Even with a third of the pixels, there wasn’t a significant difference in validation accuracies. On the one hand, this shows the robustness of CNNs. It might partly be because the Oxford Flowers 102 dataset has close-ups of flowers visible. For datasets in which the objects have much smaller portions in an image, the results might be lower.