Final Machine Learning and Reporting / Visualization

4 — Machine Learning

Preliminary ML approach:

Our machine learning approach is to train neural networks to classify painting images based on style and artist. To accomplish this we are utilizing Keras, a high-level neural network API which uses TensorFlow on the backend. To get a better feel for how to best classify images we built on a Keras binary classification example for classifying images for cat vs dog. We use 3 convolution layers with ReLu activation, we then have two fully connected layers ending with softmax activation. Softmax ensures that class (style) probability predictions are normalized and add to 1.

Preliminary CNN model:

Preliminary summary and visualization

During each training iteration we are using prediction accuracy as the training metric along with categorical cross entropy to compute loss and stochastic gradient descent for weight optimization.

Eventually we will train our neural networks using GPUs but for now we are currently using CPUs which limits how much data we can process in a timely manner for network training/validation. Because of this restriction we are keeping are training sets minimal. For training we have sued a total of 1200 images with 400 images corresponding to each of the following styles: baroque, cubism, and romanticism. For the validation phase of training we are using an additional 160 images from each of the aforementioned styles. After 50 training iterations we reach a final validation set accuracy of ~ 69%.

Below are two images generated using the TensorFlow package, displaying training and validation accuracy.

Axis labels: y axis (accuracy %), x axis (epoch)

Training accuracy (left) and validation accuracy (right)

After successfully training our model we then went back and tested it with a few individual images to for a concrete example of how the network is performing. We gave the network four paintings corresponding to the three classes (styles) it was built for as well as two additional styles (realism and fauvism) the network has never seen to see how the network would categorize these images.

Below are individual style classification probabilities for 6 previously unseen images. The first element in the array represents Baroque, the second element represents Cubism, and the third element represents Romanticism. The order of the images submitted here are (1) Baroque, (2)Baroque, (3)Cubism, (4)Romanticism, (5)Realism, and (6)Fauvism.

From left to right: (1)Baroque, (2)Baroque, (3)Cubism
From left to right: (4)Romanticism, (5)Realism, and (6)Fauvism
Classification predication based upon image input

Updated ML approach:

We have now turned our attention to training a neural network to classify twelve painting styles. We will also separately train the network to classify twenty-eight different artist. Our first couple of runs we used a stack of 3 convolution layers using ReLU activation followed by two fully connected layers ending with a softmax activation function for predicting class probabilities. A graphical representation of the network can be seen below:

CNN (3 convolution layers)

For training the two networks for style we are currently using 400 training images per style and 160 validation images. For artist we are currently using 100 training images and 20 validation images. These totals will be increased after we have settled on which network parameters work best. Additionally we are augmenting our training images using random transformations.

We this architecture in our test runs we were able to obtain a validation accuracy of ~ 23% for style classification and ~19% for artist classification.

By artist (left) and by style (right)

These accuracies were great compared to just random chance but we had difficulties obtaining higher accuracies with our current structure. With the recommendation from the Serre lab we decided to use one of Keras prebuilt model structures, VGG16, which is a prebuilt 16 layer model built for large scale image recognition. (Very Deep Convolutional Networks for Large-Scale Image RecognitionK. Simonyan, A. ZissermanarXiv:1409.1556).

We stripped away the top layer of this model and added two connected layers one with ReLU activation followed by a dropout and a connected layer with a softmax activation function for predicting class probabilities. A graphical representation of the model can be seen below:

Updated CNN

Using this architecture we were able to obtain a style classification validation accuracy of ~40% and an artist classification validation accuracy of ~52%.

Updated CNN (above model): by artist (left) and by style (right)

The training accuracies for each classification task was much higher than the validation set ( > 80% in both cases) so there is definitely some overfitting occurring. To try to counter this we decided to lower the learning rate if the validation loss stabilized after 5 rounds of training (epochs), as well as using L1 and L2 regularization.

Performing L1 regression actually made our model perform much worse with the accuracy jumping around, obtaining ~19% validation acc for style and ~25% validation acc for artist, but we received much more satisfactory results using L2 regularization. We are still in the process of fine tuning the network parameters.

Validation accuracy and validation loss using L1 Regularization:

CNN with L1 regularization: by artist (left) and by style (right)
CNN with L1regularization: by artist (left) and by style (right)

Validation accuracy and validation loss using L2 Regularization:

CNN with L2 regularization: by artist (left) and by style (right)
CNN with L2 regularization: by artist (left) and by style (right)

Final Version

For each CNN model we settled on the following parameters for generating the final models:

training rounds = 75

Style batch size = 32, artist batch size = 10

learning rate decline by a factor of 0.2 (minimum learning rate not to exceed 0.001) if validation loss does not decrease after five consecutive rounds of training

L2 regularization weight penalty of 0.01

We obtained a validation accuracy of 40% for the final style classfication CNN. For the artist classification CNN we obtained an accuracy of 53%. If running time were not an issue we would likely be able to greatly improve upon these accuracies.

5 — Reporting / Visualization

A final reporting/visualization (Post 5) that lays out our approach:

Database visualization:

For the top 12 styles, we plotted the number of paintings in each category.

Visualization of style vs. number of paintings (Bar plot)

In D3, we made a pie chart representing the data:

Visualizing our painting collection by style (Pie plot)

Interactive Application

The application we developed for this project is relatively simple. It is a shell program that allows one to input a filename, then run the final model on that image and output relative probabilities of the image belonging to each class the model specifies.

Code for interactive application

Over the next few days, we will try to implement this using our collected data.

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade