4 — Machine Learning
Preliminary ML approach:
Our machine learning approach is to train neural networks to classify painting images based on style and artist. To accomplish this we are utilizing Keras, a high-level neural network API which uses TensorFlow on the backend. To get a better feel for how to best classify images we built on a Keras binary classification example for classifying images for cat vs dog. We use 3 convolution layers with ReLu activation, we then have two fully connected layers ending with softmax activation. Softmax ensures that class (style) probability predictions are normalized and add to 1.
Preliminary CNN model:
During each training iteration we are using prediction accuracy as the training metric along with categorical cross entropy to compute loss and stochastic gradient descent for weight optimization.
Eventually we will train our neural networks using GPUs but for now we are currently using CPUs which limits how much data we can process in a timely manner for network training/validation. Because of this restriction we are keeping are training sets minimal. For training we have sued a total of 1200 images with 400 images corresponding to each of the following styles: baroque, cubism, and romanticism. For the validation phase of training we are using an additional 160 images from each of the aforementioned styles. After 50 training iterations we reach a final validation set accuracy of ~ 69%.
Below are two images generated using the TensorFlow package, displaying training and validation accuracy.
Axis labels: y axis (accuracy %), x axis (epoch)
After successfully training our model we then went back and tested it with a few individual images to for a concrete example of how the network is performing. We gave the network four paintings corresponding to the three classes (styles) it was built for as well as two additional styles (realism and fauvism) the network has never seen to see how the network would categorize these images.
Below are individual style classification probabilities for 6 previously unseen images. The first element in the array represents Baroque, the second element represents Cubism, and the third element represents Romanticism. The order of the images submitted here are (1) Baroque, (2)Baroque, (3)Cubism, (4)Romanticism, (5)Realism, and (6)Fauvism.
Updated ML approach:
We have now turned our attention to training a neural network to classify twelve painting styles. We will also separately train the network to classify twenty-eight different artist. Our first couple of runs we used a stack of 3 convolution layers using ReLU activation followed by two fully connected layers ending with a softmax activation function for predicting class probabilities. A graphical representation of the network can be seen below:
For training the two networks for style we are currently using 400 training images per style and 160 validation images. For artist we are currently using 100 training images and 20 validation images. These totals will be increased after we have settled on which network parameters work best. Additionally we are augmenting our training images using random transformations.
We this architecture in our test runs we were able to obtain a validation accuracy of ~ 23% for style classification and ~19% for artist classification.
These accuracies were great compared to just random chance but we had difficulties obtaining higher accuracies with our current structure. With the recommendation from the Serre lab we decided to use one of Keras prebuilt model structures, VGG16, which is a prebuilt 16 layer model built for large scale image recognition. (Very Deep Convolutional Networks for Large-Scale Image RecognitionK. Simonyan, A. ZissermanarXiv:1409.1556).
We stripped away the top layer of this model and added two connected layers one with ReLU activation followed by a dropout and a connected layer with a softmax activation function for predicting class probabilities. A graphical representation of the model can be seen below:
Using this architecture we were able to obtain a style classification validation accuracy of ~40% and an artist classification validation accuracy of ~52%.
The training accuracies for each classification task was much higher than the validation set ( > 80% in both cases) so there is definitely some overfitting occurring. To try to counter this we decided to lower the learning rate if the validation loss stabilized after 5 rounds of training (epochs), as well as using L1 and L2 regularization.
Performing L1 regression actually made our model perform much worse with the accuracy jumping around, obtaining ~19% validation acc for style and ~25% validation acc for artist, but we received much more satisfactory results using L2 regularization. We are still in the process of fine tuning the network parameters.
Validation accuracy and validation loss using L1 Regularization:
Validation accuracy and validation loss using L2 Regularization:
For each CNN model we settled on the following parameters for generating the final models:
training rounds = 75
Style batch size = 32, artist batch size = 10
learning rate decline by a factor of 0.2 (minimum learning rate not to exceed 0.001) if validation loss does not decrease after five consecutive rounds of training
L2 regularization weight penalty of 0.01
We obtained a validation accuracy of 40% for the final style classfication CNN. For the artist classification CNN we obtained an accuracy of 53%. If running time were not an issue we would likely be able to greatly improve upon these accuracies.
5 — Reporting / Visualization
A final reporting/visualization (Post 5) that lays out our approach:
For the top 12 styles, we plotted the number of paintings in each category.
In D3, we made a pie chart representing the data:
The application we developed for this project is relatively simple. It is a shell program that allows one to input a filename, then run the final model on that image and output relative probabilities of the image belonging to each class the model specifies.
Over the next few days, we will try to implement this using our collected data.