Machine Learning for Dummies: classify the object on an image


Hello you all. In the last few years machine learning had a blast in popularity and now everyone wants to know some basics and how to use it in everyday routine. Of course, you can start your path from scratch: learn some theory about probabilities and statistics, read some books about neural networks and fuzzy sets. Also, you can decide what language you want to use for your machine learning tasks (R, Python, C++ or Octave/Matlab) and read materials on how to implement algorithms.

But generally, machine learning is a stack of algorithms that allows you to find answer which correctly corresponds to the question. And if you have a set of data that is labeled with right answers the problem becomes quite easy (haha, i’m saying that just to get you into reading).

Today we will talk about Python side of machine learning. Don’t get me wrong, but recently Python became my personal favorite, though I have used Octave for tasks in fantastically good Stanford course on Machine Learning and I heard people are frequently using C++ and R for same problems.

Machine Learning in Python can be different. If you’re familiar with theory, then you would want to implement algorithms by hand and actually you need only NumPy to simplify work with matrices and vectors. Though in future we will talk only about high-level ML frameworks, you should know that NumPy matrices are used in every and each ML framework and you just can’t ignore it.

Next, there’s SciPy and Scikit-learn. These libraries have been recommended by many and there’re even a book published by O’Reily about ML with Scikit-learn. It is good and if you have time, you totally have to check it out, but with those tools you have to implement a lot of stuff to get things done. Who wants to write a shitton of code in 2k16, am i right? Let’s head next to TensorFlow, which is great high-level opensource framework from Google Brain team that is being used by Google in many cases of production. If you ask any ML or Data Science engineer they will tell to stick to it. Check out their tutorial for MNIST (classify handwritten numbers on image).

Tensorflow is great. And if you heading to really WORK in this industry as engineer, you totally should stick to it. But if you are here, then probably you just looking how to FAST implement a small feature in your great media mobile project, like on the image below. Maybe you are just a web designer who wants to show people magic (then you should see this).

So now we head to the framework I wanted to tell you about in the first place (also, if you are thinking right now: “when will he finally show me the code for classifier?”, don’t leave, the moment is close).


This is an opensource framework that works on top of Tensorflow (you need to have installed both).

TFlearn is a framework that you would want to use for your neural network staff. It gives you opportunity to describe layers of the network, not the tf.Variables and other init stuff that distracts your mind.

network = input_data(shape=[None, 32, 32, 3]) #input layer
network = conv_2d(network, 32, 3, activation='relu') # hidden conv
network = conv_2d(network, 32, 3, activation='relu') # hidden conv
network = max_pool_2d(network, 2) # pooling
network = conv_2d(network, 64, 3, activation='relu') # hidden conv
network = conv_2d(network, 64, 3, activation='relu') # hidden conv
network = max_pool_2d(network, 2) # pooling
network = conv_2d(network, 128, 3, activation='relu') # hidden conv
network = conv_2d(network, 128, 3, activation='relu') # hidden conv
network = max_pool_2d(network, 2) # pooling
network = fully_connected(network, 1024, activation='relu') # fc
network = dropout(network, 0.5) # throw some stuff from nn
network = fully_connected(network, 10, activation='softmax')# output
network = regression(network, optimizer='adam',loss='categorical_crossentropy',learning_rate=0.01)

That was all the architecture stuff for our model. Only 14 lines of code, but we already have implemented a convolutional neural network with a lot of hidden layers. This thing will train your model to classify objects from CIFAR10 dataset.

What you need to know from above? In first layer we allow user to insert an image to network. 32x32 image. Colored image. What you think that does mean? That mean that every 32x32 image has 3 channels of color — red, green and black. Because of that we would have NOT ONE 32x32 matrix of brightness level, but three. So, the shape of input is 32, 32, 3.

After input layer, we have 9 hidden layers of convolution and pooling. You don’t have to understand what they does to use this model, though because explanation is quite long, read about it in Machine Learning is Fun series.

In the end we just connect all the neurons in the network and apply a regression to output.

After every iteration specified optimizer will minimize loss function and you will have a result. About both topics you can read in Stanford course.

For every prediction model will return an list of 10 (last ‘fully_connected’, see?) elements with 0 or 1.

Now pause and try to think what should you do, if you have a script to train model on CIFAR10, but need a model for CIFAR100? You see, ML is easy (no).

1 in output list is an indicator of class of the object on the image. If you are planning to classify more than one object on image look Binary Crossentropy instead of loss=’categorical_crossentropy’ in regression.

You can use this method to get just the index of the image class:


I WILL ELABORATE ON THIS LATER IN THE ARTICLE, DON’T PANIC! Though, now we will talk about preparing data to be inserted in the network and train process of the model.

So, you have pictures, right? It might be wrong for the network to learn from. You need to fix them. TFlearn has instruments to do that.

There are ‘data_preprocessing’ and ‘data_augmentation’ to do what you need. Just put it in code before starting work with the network.

img_prep = ImagePreprocessing()
img_aut = ImageAugmentation()

Now you can use them in the network to fix images. Change input layer of your network to this:

network = input_data(shape=[None,32,32,3], data_preprocessing=img_prep, data_augmentation=img_aug)

Okay, so we are fixing the input for network in real time. Also, we already specified what output we need from network. What’s next? Training, of course.

You see, network without learning is just a matrix with random numbers. You need to train it. After training your network will have right number inside and will be smart, but now it’s just dumb. And as it is dumb, the training process will be dumb, too. Neurons will apply some function to the input (‘relu’ in the code, read about it in TFlearn docs or on wikipedia if you’re smart) over and over again until you say her to stop. Because of this process training will take time. Not minutes. On average MacBook Air this network will occupy laptop for 6–24 hours depending on ‘epoch’ constant (what is epoch? It’s out of scope of this article and I believe you already saw some links above which lead to good materials about ML).

Let’s train this network… No. First, I forgot (intentionally) to show you how to load dataset. TFlearn has a special method for this public dataset so this will take only a few lines of code. You should put this code on top of the w̶o̶r̶l̶d̶ script. Even before prep and aug.

(X,Y),(X_test, Y_test) = cifar10.load_data()
X, Y = shuffle(X, Y)
Y = to_categorical(Y, 10)
Y_test = to_categorical(Y_test, 10)

Usually, people train model on 75% of dataset. Other 25% goes to test model. In the code above, and other ML-themed docs, X always mean actual data, while Y means labels (rephrase: right answers to data in X).


model = tflearn.DNN(network, tesorboard_verbose=3) # init model, Y, n_epoch=85, shuffle=True, validation_set=(X_test, Y_test), show_metric=True, batch_size=96, run_id='cifar10_cnn')'myfavoritenetwork.tfl')

First line in this code snippet initializes model. ‘tensorboard_verbose’ allows you to get really verbose logs of the process and later, after training process, insert it to the Tensorboard tool and see some dope plots.

In second line we are starting process of training . If you are in hurry, you could change ‘n_epoch’ to 40–45. Folder with logs will have the same name as ‘run_id’. Model will be saved as specified in ‘’. Don’t try to search for ‘fit’ and ‘save’ methods in TFlearn docs. Those methods are inherited from TF and you should look for them in TF docs.

Model’s accuracy on training data is 90% and on test data — 84%.

So now you have trained model. How to use it?

You need to repeat your network architecture (just copy-paste from prev script) and then load your model-file.

model = tflearn.DNN(network)

Now just grab a picture, prepare it and give it to the model:

image = scipy.ndimage.imread("path/to/image", mode='RGB')
image_a = scipy.misc.imresize(image, (32,32), interp='bicubic' ).astype(np.float32, casting='unsafe')
prediction = model.predict([image_a])
result = np.argmax(prediction)

In result variable you will see index of the class of the image. If it’s == 1, then your class is a ‘car’. See available classes on the page of the dataset.


Finally, before linking full scripts for training and using model, I wanted to say that if you see errors or mistakes in the text above, please send it to me using Medium private notes.

I decided to write this tutorial after a few weeks of exploring the wonderful world of machine learning. I’m as newbie as you, but I started looking for information just a bit earlier. Please, share information that you will find in the process of learning about ML with others, let’s make this topic easier to study. And of course if you liked my article, share it too.

Script to train model:

Script to use model:

What to read next

See ya!