Could you guess the artist behind a painting you’ve never seen before? Neural networks do.
Use of Fastai and advanced techniques in the field of computer vision (Part 2)
Available in Spanish too | También disponible en español.
After explaining how to create a painting style classifier step by step, here comes the story about the implementation of a painting artist recognizer. using Deep Learning and Fastai over PyTorch.
I recommend you to read my previous article (if you haven’t already done so) to know the motivation of this small project, and the path that took us here. There you’ll also find some code and explanations about all the procedure:
Who painted this piece of art?
We already saw how complicated it’s to classify a painting within a unique artistic style…. What if we proceed with our initial intention of recognizing the artist behind a painting? Will we get better results than in the previous case? The good news is that our label here (i.e: the artist) won’t be a subjective thing :)
The data
For this new task we’ll start from the same dataset, but keeping only artists with more than 300 paintings, so we have enough samples to train our network (the number is not randomly chosen; will allow us to compare our results with another project).
These are the 57 nominated artists:
['Rembrandt', 'Ivan Aivazovsky', 'Ilya Repin', 'John Singer Sargent', 'Gustave Dore', 'Pablo Picasso', 'Marc Chagall', 'Ivan Shishkin', 'Albrecht Durer', 'Giovanni Battista Piranesi', 'Zdislav Beksinski', 'Pierre-Auguste Renoir', 'Paul Cezanne', 'Martiros Saryan', 'Camille Pissarro', 'Pyotr Konchalovsky', 'Boris Kustodiev', 'Paul Gauguin', 'Claude Monet', 'Raphael Kirchner', 'Eugene Boudin', 'Nicholas Roerich', 'Henri Matisse', 'Edgar Degas', 'Vincent van Gogh', 'Childe Hassam', 'Camille Corot', 'Theophile Steinlen', 'Salvador Dali', 'Alfred Sisley', 'Odilon Redon', 'Isaac Levitan', 'Fernand Leger', 'James Tissot', 'Eyvind Earle', 'Zinaida Serebriakova', 'Henri Martin', 'Ernst Ludwig Kirchner', 'Francisco Goya', 'Maurice Prendergast', 'Peter Paul Rubens', 'Henri de Toulouse-Lautrec', 'Konstantin Makovsky', 'Erte', 'William Merritt Chase', 'David Burliuk', 'Rene Magritte', 'Theodor Severin Kittelsen', 'Utagawa Kuniyoshi', 'Max Ernst', 'Amedeo Modigliani', 'M.C. Escher', 'Albert Bierstadt', 'Sam Francis', 'Konstantin Korovin', 'Felix Vallotton ', 'Mary Cassatt']
This time we choose to do nothing about the unbalanced number of samples by category, so we’ll have a set of between 300 and 500 works from each artist.
We visualize a small bunch of the paintings included in our reduced dataset:
Network based on ResNet-34
As in the case of the style classifier we’ll use a convolutional neuronal network (CNN), based on a pre-trained model with ImageNet, after which we’ll add new layers. The difference here is that we’ll try ResNet-34 first, because it’s a more lightweight architecture. Then we’ll also try ResNet-50, so we can compare results and execution time.
Repeating the same steps we took on previous classifier we execute first stage, which is summarized below. Four epochs is enough for this phase:
After choosing the maximum learning rate for the different layers (see previous article), we run the second stage starting from this model, not without unfreezing the weights from the ResNet layers. This is the summary of the training for the full network:
We get a 75.93% accuracy; a pretty good value, far superior to the style classifier’s result.
Network based on ResNet-50
We repeat the process using the 50 layers version of ResNet; what a priori should give us better results by the mere fact of being larger, although it’ll negatively affect the training time.
We execute the first stage with initial layers’ weights frozen, obtaining this:
Almost reached the final accuracy achieved by ResNet-34.
Now we proceed with the second stage. Plotting loss function versus learning rate gave us a very broad valid range, so we test different parameter values. Below is the summary for our best execution. We stop when accuracy stops improving and validation loss is noticeably higher than training loss.
The achieved accuracy on the validation set is 79.55%!
Our final model got a 3.6% higher accuracy than the one built over ResNet-34. But we also notice that training time was literally doubled.
Results interpretation
Let’s go with more numbers! First we visualize the confusion matrix to get a global view of model errors:
There seems to be no such recurring confusions as in the style classifier case. In general, for the vast majority of artists, mistakes are more scattered; although it’s also true that we have more categories here (57 vs 25).
Now let’s take a look at the top most frequent confusions:
[('Rene Magritte', 'Salvador Dali', 10),
('William Merritt Chase', 'John Singer Sargent', 10),
('Boris Kustodiev', 'Ilya Repin', 8),
('Konstantin Makovsky', 'Ilya Repin', 8),
('Pablo Picasso', 'Henri Matisse', 8),
('Camille Pissarro', 'Alfred Sisley', 7),
('Albrecht Durer', 'Peter Paul Rubens', 6),
('Claude Monet', 'Alfred Sisley', 6),
('Paul Gauguin', 'Camille Pissarro', 6),
('Theodor Severin Kittelsen', 'Theophile Steinlen', 6),
('William Merritt Chase', 'Childe Hassam', 6),
('Zinaida Serebriakova', 'Konstantin Korovin', 6),
('Albert Bierstadt', 'Ivan Shishkin', 5),
('Camille Pissarro', 'Eugene Boudin', 5),
('Camille Pissarro', 'Paul Gauguin', 5),
('Ilya Repin', 'Konstantin Makovsky', 5),
('Theophile Steinlen', 'Henri de Toulouse-Lautrec', 5)]
We can see the 2 major mistakes of the network occur between:
- Magritte and Dali: both representatives of Surrealism, a rather peculiar style.
- Chase and Sargent: belonging to Impressionism, with very similar themes:
After these, a handful of artists that our model confuses with each other… Pissarro, Sisley, Monet, Gauguin or Boudin. Impressionism again! or “Post-Impressionism” maybe 😅
And what about Monet and Boudin? our classifier registered only 3 confusions between them!
What if we create a specific classifier to differentiate only between these 2 painters? Will these numbers improve? Let’s see…
Monet or Boudin?
We’ll repeat the same previous process to model our network starting from ResNet-50, with only one difference: we’re reducing the number of categories (from 57 to 2).
A good visualization for the problem we are facing. Easy?
This is the training summary for our network in its second stage:
We get a 98.48% accuracy! it’s a great result, isn’t it?
Let’s see where our model failed exactly…
Our classifier was mistaken in 3 paintings, although it was very confident of his decision in the first 2 (we could investigate them all in order to understand the reasons; it could be related to something like image quality).
We notice that our model bases its decisions on a fairly high probability, which measures its high degree of confidence.
Anyway! I better not tell you all the mistakes I made in my attempt to guess the artist behind each painting in that exhibition I visited…. And I saw the paintings in all its splendor! not sad 299x299px miniatures :)
Conclusions
As we could suspect, we found that it’s not easy to identify a painting artist; having to distinguish artists from same period, with same style, and even similar themes or scenes. Fortunately the identity of the author is something more objective than the style.
The used approach to create our model works really well and returns great results (79.55% accuracy); improving those obtained by the most recent projects I found (2016 , 2017). And that was leaving almost everything on hands of fastai library, without using an expensive machine (free instance in Google Colab), and requiring little running time.
Using fastai offers us many advantages when executing a Deep Learning project; reducing development to very few lines (what it would cost us doing the same things in PyTorch?), and giving us both the most used architectures and the most advanced techniques.
Field of computer vision evolves very fast!
Future lines of work
There are several possibilities to expand the scope of this mini-project, including:
- Create an API where you can send a painting in order to guess its artist, or a style classification.
- Comparison with other architectures, or larger versions of ResNet.
- Test other image transformations for training, having into account our particular case.
- Use of detailed image clippings so the neural network can see things like brushstrokes or materials used. They’re actually 2 essential parameters for the human eye, and they could be also interesting for neural networks.
And that’s all! I hope you found it interesting :)
Once again I refer you to my first article for a more technical explanation. To read/get all the used code and the intermediate tests… what better than the original notebooks used here! 👼