Animal Sketches

Can I classify sketches of bears, cats, & dogs?

Andrew Bergman
The Startup
5 min readNov 5, 2019

--

Google has a game called Quick, Draw! in which the user has to sketch six different things with a time limit of 20 seconds per drawing. It’s not just a fun game: they have the data available open source on GitHub for use in machine learning projects and for research.

Why Sketches?

Character recognition is a subset of image classification which has applications from reading X-rays to helping autonomous cars drive safely. Character recognition has a lot of day-to-day use: the Post Office uses it to read addresses on mail and various programs use it to convert handwritten text to type.

The Data

Google offers the Quick, Draw! data in a variety of formats, but I chose two to work with: raw .ndjson and .npy bitmap files.

I used the two sets of files for different purposes: I modeled using the .npy files and performed EDA with the raw files because they have the metadata I was interested in.

There was not really any pre-processing work to be done because the .npy files had already been resized and centered by Google. All I really had to do was to scale each pixel and then make sure the data was passed into the model as a 28x28 image.

The Model

I made use of a Keras convolutional neural network for classification which was run in an AWS SageMaker instance because I needed a GPU for the modeling.

The topology of the network is relatively simple and has two parts:

  • Convolution: two convolutional layers + one maxpooling layer

Convolution is a complicated process, but a brief summary is that it goes through each image and determines important features in that image. The following maxpooling layer then reduces the size of the convoluted image by breaking the image down into a series of grids and keeps the highest value in each.

Illustration of how convolution works
  • Dense: two dense layers + two dropout layers + 2 batch normalizing layers.
Diagram Of A Neural Network

The dense layers are where the actual processing takes place. Data is passed through each node where a weight and bias are generated and used to calculate the output of that node.

Because of the amount of information being process, I was worried about severe overfitting which is something neural networks are prone to. To combat the overfiting, I made use of a couple of regularization techniques:

  • L2 regularization at each node to reduce the weights of unimportant pixels
  • Batch normalization at each node to “reset” the output from each node by reducing the shift from each node and increases the stability of the model overall
  • Dropout to negate the outputs from randomly selected nodes.

Finally, when I fit my model to my training data I included early stopping which is a technique to stop the model if the accuracy does not keep improving.

Evaluation

I used two techniques to judge model performance: metric scores and model loss/accuracy.

Metrics

I used five metrics when evaluating the model:

  • Accuracy: how many predictions are correct
  • Balanced Accuracy: the average recall on each class
  • Specificity: the number of correctly made negative predictions
  • Matthews Correlation Coefficient: how well the predicted labels agree with the true labels.
  • Jaccard (Similarity) Score: measures how much agreement/similarity there is between the true and predicted labels.
Graph Showing The Scores For Five Different Metrics

The scores are not bad, but they are not great either. The scores that look at predictions (accuracy, balanced accuracy, & specificity) are fairly consistent and the scores that look at labels (Matthews correlation coefficient and Jaccard score) are also fairly consistent. This gives the impression that the model is doing okay with the three classes; it is also minimally overfit.

Model Loss

The model loss is the error rate and thus it should be as low as possible.

Both had a dramatic drop in loss from the first to third epochs, but the drop flattened out towards the eighth epoch. The test actually had a slight increase in loss which was stopped by early stopping.

Model Accuracy

Accuracy is simply a measure of how many posts were classified correctly.

There was a sharp increase in accuracy for both from the first to second epochs, but that increase leveled out quickly for both. The test data experience more variation before finally decreasing into the eighth epoch which triggered the early stopping.

Conclusions

It is definitely possible to classify images using a convolutional neural network, but it was challenging with the data I chose to use. The data I used was not consistent: there was a lot of variation in the drawing of each type of animal which made it difficult for the network to define each class. Additionally, the images were very small: a 28x28 image does not contain much information.

I am confident that if the data were better, my model would generate better predictions.

This project was imagined as a way for me to get my feet into the realm of image processing and a way to try multi-class classification. I’m planning to work on another more complicate image processing project some time in the future.

In the meantime, I’m open to any and all questions or comments.

The repository can be found here.

I can be contacted on LinkedIn here.

--

--