Painting style classification with 7 lines of code

Using fastai and advanced techniques in the field of computer vision (Part 1)

Jaime Durán
yottabytes
10 min readJun 3, 2019

--

Claude Monet — “Impression Sunrise”

Artículo también disponible en español.

Motivation

A few months ago I was lucky enough to visit a joint exhibition about Claude Monet and Eugène Boudin. Monet is the greatest exponent of Impressionism, and one of my favorite painters (I even copied a painting of him when I was a kid). Surprisingly, I didn’t know anything about the second, but it turns out that Monet was his disciple (he began painting because of Boudin), and both had a long relation of friendship, disagreements, and mutual admiration above everything else.

In the exhibition you could see the evolution of both painters, from the same point of departure, but following different paths; which crossed again and again. Result: a visit of almost 1 hour where the first thing you did when you put yourself in front of a new painting was trying to guess its author. But it wasn’t easy!

Reviewing the application of convolutional neural networks to the problem of image recognition, that exhibition came to mind. How difficult would it be for these networks to recognize the painting author? Is that more complicated than distinguishing animals in photographs? Well! what better than trying to verify the results?

And what if… taking advantage of the occasion, we also create a style classifier in order to categorize the paintings available in the dataset? And I could tell you step by step how to achieve that with only 7 lines of code, using fastai. Let’s go!

Introduction

The data

In order to train our neural network we’ll need the pictures of many paintings. Luckily we have a great dataset compiled from WikiArt for a Kaggle competition. It has more than 100,000 paintings belonging to 2,300 artists, covering a lot of styles and eras. All images are labeled in a separate CSV file, containing information like this:

The tools

For this small project we’ll use the fastai library, which works on top of PyTorch. Fastai simplifies the process of training neural networks, facilitating the application of leading-edge techniques, in a efficient way, and obtaining state-of-the-art results in artificial vision, natural language processing, tabular data and collaborative filtering.

Creating our style classifier

As I already warned, the first thing we’re trying to do with the data, as a first contact and out of curiosity too, is to create a model to classify the paintings by style. We have this information available in the style column. We’ll print out all the categories to know what we are dealing with:

Nothing more and nothing less than 136 different styles

We already face the first problem, since (in my opinion) the categories are too atomized. I think we should use only the most frequent styles in the dataset (in my case I tried 16 and 25, getting similar results).

If we use 25 categories we’ll have a minimum of 750 images available for each one (as there are categories with many more images, we can choose to balance the dataset using the same number for all).

And now the fastai library comes into play. With a single (long) line of code we’ll manage to:

  • create the training and validation subsets from a dataframe, following an appropriate structure.
  • assign a series of random transformations for the images (ds_tfms), which work quite well in many scenarios (see next paragraph).
  • configure the images’ input size for our network (size).
  • assign the batch size (bs) for training.
  • apply normalization; something desirable whenever we are using a neural network (normalize()).

Data augmentation is possibly the most important regularization technique when training a model for image recognition. Instead of feeding our network with the same images over and over again, we’ll include small random transformations (a bit of rotation, translation, zoom, etc.) that won’t change the image content at first glance, but will modify the values of its pixels. Models trained with this technique will generalize better. In our case, it may have less sense than in other scenarios, since the paintings don’t move, and the photographs are centered and with no space around them. But if we think for example about the possibility of releasing our model as part of a service, where anyone could send an image made with the mobile, for sure that we’d want to apply this technique.

We print a small batch of tagged images, to visualize the problem we are facing:

Creation and training of the model

To carry out our task, we first need to choose the type of neural network we are going to use, and the underlying architecture.

We will base our classifier on a convolutional neuronal network (CNN), due to its proven great performance solving image classification problems. For the architecture we won’t start from scratch; we’ll use a pre-trained model on ImageNet: a dataset with more than 1 million images, which already knows how to recognize many things. More specifically we’ll test ResNet-34 and ResNet-50 (the number refers to the layers). So our task will be applying what ResNet learned from ImageNet to our network (Transfer Learning), and we’ll only have to add additional layers for the new images.

To carry out the creation of the model and its training we will use 2 stages. In the first one, with fastai we’ll only need to execute 2 lines:

  • First, we’ll instantiate a CNN-specialized Learner, specifying the base architecture (ResNet-50) and the metric or metrics to be obtained.
  • Next, we’ll use the fit_one_cycle() function, to train the model using a very efficient algorithm for complex architectures (more on 1cycle). The main parameter is the number of epochs to be executed.

After this, we get the first results:

The main idea behind second stage is to unfreeze the weights of the layers belonging to the initial architecture (those from ResNet-50), to re-train our complete model. Following this way we should improve our classifier a bit more.

The key at this point is properly choosing the maximum learning rate for the different layers of the model. We’ll plot a graph for that with only 2 lines of code in fastai:

The result:

We’ll decide the maximum rate for first layers (a small one, since they don’t need much adjustment). For last layers we usually choose a max learning rate 10 times lower than the one used on first stage (3e-3 by default), as long as the cost in the graph doesn’t get worse.

We unfreeze the weights and re-train the model with only 2 lines:

And that’s all; with only 7 lines of code we get our trained model and its evaluation:

The accuracy of our painting style classifier reaches 56.99%. Not bad if we take into account the difficulty of deciding which of the 25 styles is the right one. Of course, its accuracy surpasses mine!

Next we’ll analyze the errors, to put the achieved accuracy into value.

Results

The fastai library provides us with several tools to interpret the results of our model.

For example we can plot the classifier’s worst errors (we could even debug them using a heat mask telling us in which part of each image the network based its decision):

We can also plot the typical confusion matrix, allowing us to observe a summary of where our classifier failed:

There are several styles where it fails very often. But let’s see for now the styles where it works well. It’s funny, but I think I agree with the neural network on the easiest one to differentiate: Ukiyo-e!

Let’s now visualize the top mistaken ones:

At first glance I’m not at all surprised by the confusion between:

  • Abstract Expressionism and Art Informel (as I can read, they are parallel movements that occurred in the 40/50’s in the United States and Europe respectively).
  • Neoclassicism and Academicism (for the Spanish version of Wikipedia they are the same thing; for the English version, the second is a synthesis of the first plus Romanticism).
  • Impressionism, Post-Impressionism and Expressionism (what is strange here is that the classifier didn’t get confused even more).
  • Romanticism and Realism (the second succeeded the first, differing especially in the base theme).
  • Baroque and Rococo (the second is an extension of the first).
  • The different styles within the Renaissance.

If you are interested in the subject, I invite you to look a bit for the difference between the styles inside each group ;)

Using our classifier with new images

Once we have our model trained, predicting the style of a new image is pretty simple. We’ll achieve that with 2 lines of code:

Let’s try it!

Munch + Rick and Morty

The prediction is quite clear. But we know that Munch’s original painting belongs to Expressionism. What if we crop the image to leave out the ugly bug in the background?

Munch + Morty

The cut is classified within Expressionism, with more confidence than previous decision. Actually it looks too much like the original, so this result is not surprising.

And why not try one of my paintings? well, I only have one picture of those (**spoiler** is a copy):

In my humble opinion, this painting doesn’t fit into Expressionism, and the classifier had doubt between 2 styles, with little confidence in the decision.

Maybe I’m an eclectic artist … xD

Conclusions

It’s not easy to categorize certain paintings by their style, and I’m sure that the people who labeled these paintings also had (and would have right now) their doubts in doing so. Just as you can not label all the work from a painter in a single category, the delimitation between styles isn’t as clear as it can be in other areas, and there are paintings that could perfectly fit under more than one label; as well as categories not differentiated by the style itself, but by formalism.

Even so, the results achieved (56.99% accuracy) are very good, comparing with the considered as state-of-the-art very recently (2015, 2017), and taking into account that we didn’t do much to improve the input data (and nothing to model the network itself, although this is merit of the fastai library and the research in the field).

Surely the results could be improved using more images, cuts out of them, more generic categories, other transformations different from the default ones, etc. But it wasn’t the purpose of this project.

EDIT: In a training after the publication of this article I got a 57.91% accuracy, 1% better :)

And after checking how complicated it is to assign a painting to a specific style… How about following now with our initial intention of recognizing the author of a painting? Will we get better results? The good news: in that case, the label or dependent variable won’t be subjective at all :)

But I think it’s been enough for one article; that will be explained in the next! Available now; click below 😃

--

--

Jaime Durán
yottabytes

Yet another data scientist with a blog. In fact I write two (uno en español)