Painting Classification Model

Published in

unpack

6 min readAug 4, 2021

The objective of the Painting Classification Model is to recognize the concept of the painting based on a specific characteristic which make the image inclined more to a certain category. The categories that were worked on this project are as follow: animal, cityscape, flower, landscape, portrait and religious.

This project goal is centered on simple evaluation of how it should give better performance and accuracy while using different architectures. The base architecture used in here is “restnet”, with the levels of 34, 50 and 152. Using fastai Layering API for the creation of the model and Google Collaboration app to code and train the model.

The dataset is obtained from Kaggle, specifically form an AI Software Engineer “M.Innat” (https://www.kaggle.com/ipythonx/wikiart-gangogh-creating-art-gan). Not all categories were used except the mentioned ones before.

Art is a complex concept and profession where perception plays a humongous role at the eyes of the preceptor, more reason for it to be hard to classify art into specific category; and in this project, we tried test how close can we make it so. What is notorious is that some categories might be better together than separated because the model will tend to get confused if there are more than one category visible in the painting, for example, cityscape and landscape; both have a highly percentage of having content from each other and still be different. We as humans might know but to compute this into a model and having a likely response is the deal.

Once said this, this datasets has complex paintings where it goes to the abstract spectrum even when we do not have that category in here. So, in order to make this simpler, images that contains a not so clear concept that can belong to any of the chosen categories are deleted of around 10 images were deleted at most for each class from 500 images previously selected. Therefore, for each category we have selected an approximated of 450 to 490 images, where 25% of the data is for validation and the rest is for training the model.

All images were resized to 128x128 pixels and previously shrunk to half of the original size for better manipulation. This filtering might have affected the training of the model due to the loss of layers of details.

As we can see some examples below, there are some similar images where they are not considered for this project due to its level of abstraction.

Comparison

For the next table we are going to visualize a table that indicates the results of each used architecture and later select a suitable one for this type of datasets. In this case, a suitable architecture for this project was restnet34 and good enough with the training of 20 epochs. Although is better to run it with more epochs but the outcome does not go over the expectations.

Comparative table for the evaluated architecture on different levels.

At the table we can observe that having augmented data by all the transformation methods available within the fastai API works better than having partial transformations (flipping, zooming, warping and lighting) or none. In this case, giving an 85% of accuracy while it maintains a small range of oscillation for the training loss and validation loss values.

The following training represents the first model that was tried for finding the differences and thrive for a better result.

For reviewing the values, we deploy a confusion matrix which tell us how the machine did the classification and visualize how many false positives we get per combination of categories.

From this matrix, we can infer that “landscape” and “cityscape” gets a lot of attention and mixed information. Also the animals might have something related to the other categories which gives mixed predictions. “Religious” and “portrait” are giving similarities in the prediction due to the characteristics of each class.

Even so, the predictions are not far from what it should be. This scenario is mostly produced by the data. The data is not as specific or clean as it should be for this model to be able to predict accordingly.

Per this case, there is no need to get deeper into the architecture if this can be accomplished with a simpler, faster one and with better accuracy performance.

Restnet34 architecture was chosen as the training model to test and accomplish a Painting Classification Model.

Training with Resnet34

We started training with 20 epochs at the evaluation phase, and it gave good accuracy value even with this kind of mixed concept data. So, for this step, the number of epochs are going to be incremented to study its behavior awaiting for a better performance without over-fitting the model.

Starting with 40 epochs on the resnet50 architecture, the accuracy tends to fluctuate (maintains within a close range) but the training loss is getting lower and the validation loss getting higher. In this case, is better to run more epochs in order to validate that the model is not over-fitting.

CNN Learner. All default augmented transformation. Restnet50

On the previous image we can see that the accuracy keeps going up and down, which is normal but needs to have a deeper investigation to confirm if this architecture fits this datasets needs.

Confusion matrix. All default augmented transformation. Restnet50

Still here on the confusion matrix that landscape and cityscape have mixed data that includes both of the categories are the combination that gives higher false positives, sequentially, animal which have other details in it, have a match with all classes.

For this model, even with a higher number of epochs, the accuracy will oscillate within a certain range. It will get higher but not sure from which one. If we compare with the last confusion matrix, the predictions are still influenced by the labeling of the data.

Loss data. All default augmented transformation. Restnet50

Deployment of the model app

The app is available at https://paintingmodel.herokuapp.com.

Bellow are a few tests of the deployed model.

These few images seem to have more than one category but it will just work as long as the machine recognizes what are the class(es) that should the image have.

Future development

This model was run without GPU, therefore the long duration of training time per epoch goes to almost 50 min. This model can be improved by either using another model or mix the data based on similarities among them.

This test, even though the training accuracy did not improve, train and validation loss are quite getting stable. Which if it continues training, we might get new results that can have better performance.

Batch 16. Image size of 224 pixels and Resnet152 architecture.

Learner loss prediction. Batch 16. Image size of 224 pixels and Resnet152 architecture

Is worth a shot to tune this model with other configuration settings and have the experience as a base for the creation of future models.

Painting Classification Model

Comparison

Training with Resnet34

Deployment of the model app

Future development

Written by Mimia