Image classification, CIFAR10 dataset and initial ideas of solution

Published in

Planet Arkency

7 min readDec 20, 2015

There are problems in computer science which are not easy when you start thinking about the direct solution to a problem — and one of them is a so-called problem of image classification. Given a piece of binary data, which is essentially an image you need to conclude the contents of the given image — whether it is a bird, or a truck and so on. Image classes are given upfront — so your solution don’t need to be able to create them during the computation.

Having a problem like this it is hard to think in a classical approach to solve problems in computer science — that is, creating a sequence of steps which will produce a deterministic output. This leads us to different approaches.

On this blog I’d like to tell you how to create a solution to a problem of image classification using machine learning and data mining techniques. Both of those approaches have foundations in propability theory and statistics — and both are using data classified upfront to create a model which is able to classify unknown data samples.

Quick math basics

Let’s start with a solid problem description. Having a foundation like that will help me further with keeping descriptions concise since I’ll have a vocabulary to build upon it. I’m going to use the mathematical notation, but don’t worry — the intuition behind will be also presented.

So, to start classifying an image, we need an image. The natural representation of an image is a matrix (or two-dimensional array) of pixels. Each pixel has a color. The color is represented by three numbers — how red it is, how green it is and how blue it is. This is so-called RGB model of a pixel — those values are getting values from 0 (no color) to 255 (the most intense color).

Here’s a quick overview of what you’ll need to know about notation:

EDIT: This is a rather developers’ point of view meaning of vectors. Thus, it is rather informal.

Row vector

Vector transposition (column vector or 1–3-matrix)

Dimension of a row vector

Dimension of a transposed vector (matrix)

Matrix is quite similar to vector, but it has two dimensions. You may think that a matrix is a vector of vectors. Similarly, you can think about vectors as matrices having one row or one columns. Those are interchangeable.

Example of a 2 x 2 pixels image. First 2 is a height (number of rows), second 2 is a width (number of columns).

Indexing the matrix — notice row is first, column is second.

Dimension function works for matrices too.

Such three value ordered entities are called vectors in math — and this will be a representation of the pixel. So 32 × 32 pixels image is a matrix of pixels which has a shape 32 × 32 × 3.

Shape is a dimension, but applied for elements too.

Each attribute has a domain from which it came. This domain is a set — just like the set you know from your math lesson — an unordered collection of things. In the case of pixels, all attributes have the same domain — a real numbers set.

Those are all equivalents of specifying the domain. p is a pixel here.

You can further slim down the domain to say it is a set of 0, 1, …, 255 but it is unnecessary. You’ll still be right, but there is a little value of being so precise.

Image classification problem

So, armed with vectors and matrices definition, let’s specify the real problem.

As an input, we have image, w pixels wide and h pixels in height. Such image has a class. You can think about it as a label which is attached to an image — so whether it presents a car, a bird and so on. In real world there can be situations where an image depicts both car and a bird sitting on it — but this solution will ignore this fact and classify the image to one class only. The real class of the image will named y.

In the case of this solution image will be represented as a matrix:

Image representation

It is because the knowledge about the exact width and height is unnecessary, so the w and h dimensions are flattened to avoid operating three-dimensional entities.

The goal is to create a classifier — a function c which takes such matrix as an input and attaches a class to it (which is unknown):

The classifier function.

Of course we’d like to make this classifier accurate — that is, this hatted y should be equal to y (real class) as often as possible.

In machine learning we need a set to learn from — it is a part of creating the classifier. It is called a training set and it’ll be noted as:

Training set definition.

The training set contains n data samples and subscripted (i) is a i-th data sample from this set. It is a set, so the ordering of samples is irrelevant. This numbering is completely arbitrary, but we need to have any numbering for our further goals.

So, tl;dr is:

Having a training set T and an unknown sample D of a class y, construct a classifier c which is as accurate as possible.

In further blogposts such classifier will be constructed, step by step. The heart of this classifier will be a so-called neural network. For the goal of increasing an accuracy there will be certain data mining and machine learning techniques applied, too.

Technology

The solution will come in a form of a computer program. It will be written in a Python programming language with a help of numpy, sklearn and scikit libraries.

First library is a specialized library for performing vector and matrix computations — basically it utilizes the full power of the machine processor to compute it. It is a well-established standard to write machine learning and data mining solutions, being on par with sophisticated mathematical applications like MatLab or Mathematica. It is smaller than the latter two, but gets job done just as good as them.

The second one is using numpy and is a collection of algorithms common in the world of machine learning and data mining — so I don’t need to write everything from scratch. It has also built-in handy tools like error functions, gradient computations and so on — which will be extremely useful while implementing the final solution.

The last one is a library which have many common statistical functions built-in and some optimization enhancements like sparse matrices. It is included for my convenience and to not reinvent a wheel.

Dataset

The dataset I’d like to work with is a well-known CIFAR10 dataset. This dataset consists of images which are 32 × 32 pixels wide. It consists of 50000 images from a training set and 10000 images from the test set. You’ll learn further about differences between training set, validation set and test set.

Images are classified to ten possible classes: airplane, automobile, bird, cat, deer, dog, frog, horse, ship and truck.

Classical data mining techniques like k-nearest neighbors algorithm or k-means algorithm gets around 30% [EDIT: wrong percentage] accuracy in this dataset without preprocessing. I’d like to improve this bound to at least 75%.

Possible problems

There are certain problems that both data mining and machine learning solutions have. One is overfitting to training data, where the classifier is too much biased to the training set — thus it makes wrong decisions for a test set because it tries to fit the data too much to the training samples. On the other hand, there is underfitting problem, where the solution is not trained enough — so it omits the important information about the model, thus it is too weak to classify test set images properly.

Example taken from sklearn package — underfit on the left, nice trained solution in the middle, overfitted solution on the right.

There is also a performance problem — machine learning and data mining solutions are computationally expensive and often needs a lot of memory and/or processing time to construct a valid solution. In case of machine learning there is a problem of when to stop learning. The bigger the dataset, the worse this problem starts to be. There will be some solutions which reduce the traning set tried (I hope!) while creating the real solution — like PCA-based solutions or some linear regression flavors.

Solution Roadmap

Here are general steps I’d like to do:

Try to apply some naive classifier first, like decision trees or perform grouping and try to mimic classification problem this way. This will allow me to see where is the problem with classes.
I’ll try to make some preprocessing — reducing dimensionality (PCA, kPCA, LLE and so on), normalization/standarization and so on.
I’d like to experiment with learning techniques — maybe SVM, changing the error functions and so on)
I’d like to try a way to visualize the work of the neural network — but this is optional.

The Code

You can watch how I progress through the code here on GitHub. I’ll try to update this repository as often as possible. Also blogposts will be based on the results constructed there.