Everything you need to know to master Convolutional Neural Networks

Look at the photo below:

Image for post
Image for post
Courtesy of Pix2PixHD

That is not a real photo. You can open the image in a new tab and zoom into the image. Do you see the mosaics?

The picture is actually generated by a program called Artificial Intelligence. Doesn’t it feel realistic? It’s great, isn’t it?

It has been only 7 years since the technology was brought to the public by Alex Krizhevsky and friends via the ImageNet competition. This competition is an annual Computer Vision competition to categorize pictures into 1000 different classes. From Alaskan Malamutes to toilet paper. Alex and friends built something called AlexNet, and it won the competition with a large margin between it and second place.

This technology is called a Convolutional Neural Network. It’s a sub-branch of Deep Neural Networks which performs exceptionally well in processing images.

Image for post
Image for post
Courtesy of ImageNet

The image above is the error rate produced by the software that won the competition several years back. In 2016, it is actually better than human performance which was around 5%.

The introduction of Deep Learning into this field is actually game breaking more than game-changing.

Convolutional Neural Network Architecture

Image for post
Image for post

Convolutional Neural Networks perform better than other Deep Neural Network architectures because of their unique process. Instead of looking at the image one pixel at a time, CNNs group several pixels together (an example 3×3 pixel like in the image above) so they can understand a temporal pattern.

In another way, CNNs can “see” group of pixels forming a line or curve. Because of the deep nature of Deep Neural Networks, in the next level they will see not the group of pixels, but groups of lines and curves forming some shapes. And so on until they form a complete picture.

Image for post
Image for post
Deep Convolutional Neural Network by Mynepalli

There are many things you need to learn if you want to understand CNNs, from the very basic things, like a kernel, pooling layers, and so on. But nowadays, you can just dive and use many open source projects for this technology.

This is actually true because of the technology called Transfer Learning.

Transfer Learning

As an example, say you are working in a Train Management company, and want to assess whether your trains are on time or not. And you don’t want to add another workforce just for this task.

You can just reuse an ImageNet Convolutional Neural Network model, maybe ResNet (the 2015 winner) and re-train the network with the images of your train fleets. And you will do just fine.

There are two main competitive edges when you use Transfer Learning.

  1. Needs fewer images to perform well than training from scratch. ImageNet competition has around 1 million images to train with. Using transfer learning, you can use only 1000 or even 100 images and perform well because it is already trained with those 1 million images.
  2. Needs less time to achieve good performance. To be as good as ImageNet, you will need to train the network for days, and that doesn’t count the time needed to alter the network if it doesn’t work well. Using transfer learning, you will only need several hours or even minutes to finish training for some tasks. A lot of time saved.

Image Classification to Image Generation

Challenge accepted!

Generative Adversarial Network comes to the scene.

Image for post
Image for post
CycleGAN by Jun-Yan Zhu

This technology can generate pictures using some inputs.

It can generate a realistic photo given a painting in a type called CycleGAN which I give you in the photo above. In another use case, it also can generate a picture of a bag given some sketches. It can even generate a higher resolution photo given a low-res photo.

Image for post
Image for post
Super Resolution Generative Adversarial Network

Amazing, aren’t they?

Of course. And you can start learning to build them now. But how?

Convolutional Neural Network Tutorial

Let’s put aside mastering it for now.

After browsing for several days, I found this project which is really suitable for you to start with.

Aerial Cactus Identification

Pretty simple, eh?

You will be given 17,500 images to work with and need to label 4,000 images that have not been labeled. Your score is 1 or 100% if all the 4,000 images are correctly labeled by your program.

Image for post
Image for post

The images are pretty much like what you see above. A photo of a region that may or may not contains a group of columnar cacti. The photos are 32×32 pixels. And it shows cacti in different directions since they are aerial photos.

So what do you need?

Convolutional Neural Network with Python

  1. Tensorflow, the most popular Deep Learning library. Built by engineers at Google and has the biggest contributor base and most fans. Because the community is so big, you can easily find the solution to your problem. It has Keras as the high-level abstraction wrapper, that is so favorable for a newbie.
  2. Pytorch. My favorite Deep Learning library. Built purely on Python and following the pros and cons of Python. Python developers will be extremely familiar with this library. It has another library called FastAI which gives the abstraction Keras has for Tensorflow.
  3. MXNet. The Deep Learning library by Apache.
  4. Theano. Predecessor of Tensorflow
  5. CNTK. Microsoft also has his own Deep Learning library.

For this tutorial, let’s use my favorite one, Pytorch, complemented by its abstraction, FastAI.

Before starting, you need to install Python. Go to the Python website and download what you need. You need to make sure that you install version 3.6+, or it may not be supported by the libraries you will use.

Now, open your command line or terminal and install these things

pip install numpy 
pip install pandas
pip install jupyter

Numpy will be used to store the inputted images. And pandas to work with CSV files. Jupyter notebook is what you need to interactively code with Python.

Then, go to the Pytorch website and download what you need. You might need the CUDA version to fasten your training speed. But make sure that you have version 1.0+ for Pytorch.

After that, install torchvision and FastAI:

pip install torchvision 
pip install fastai

Run Jupyter with the command jupyter notebook and it will open a browser window.

Image for post
Image for post

Now, you are ready to go.

Prepare the Data

import numpy as np
import pandas as pd
from pathlib import Path
from fastai import *
from fastai.vision import *
import torch
%matplotlib inline

Numpy and Pandas are always needed for everything you want to do. FastAI and Torch are your Deep Learning Library. Matplotlib Inline will be used to show charts.

Now, download data files from the competition website.

Extract the zip data file and put them inside the jupyter notebook folder.

Let’s say you named your notebook Cacti. Your folder structure would be like this:

Image for post
Image for post

Train folder contains all the images for your training step.

Test folder contains all the images for submission.

Train CSV file contains the training data; mapping the image name with the column has_cactus which gives a value of 1 if it has cactus or 0 otherwise.

Sample Submission CSV file contains all the formatting for submission that you need to do. The file names stated there are equal to all files inside Test folder.

train_df = pd.read_csv("train.csv")

Load the Train CSV file to a data frame.

data_folder = Path(".") 
train_img = ImageList.from_df(train_df, path=data_folder, folder='train')

Create a load generator using the ImageList from_df method to map train_df data frame with images inside the train folder.

Data Augmentation

You will need this technique a lot if you happen to have very little data to work with.

transformations = get_transforms(do_flip=True, flip_vert=True, max_rotate=10.0, max_zoom=1.1, max_lighting=0.2, max_warp=0.2, p_affine=0.75, p_lighting=0.75)

FastAI gives you a nice transformation method to do all of this, called get_transform. You can do a flip vertically, horizontally, rotate, zoom, add lighting/brightness, and warp the image.

You can play with the parameter I stated above to find out how it will look. Or you can open the documentation and read about it in detail.

Training Preparation

Of course, apply the transformation to your image list:

test_df = pd.read_csv("sample_submission.csv") 
test_img = ImageList.from_df(test_df, path=data_folder, folder='test')
train_img = train_img
.transform(transformations, size=128)
.databunch(path='.', bs=64, device=torch.device('cuda:0'))

For the training step, you need to split some of your training data into a small portion called validation data. You can’t touch these data because they will be your validation tool. When your Convolutional Neural Network performs well on validation data, it will likely perform well on the test data that will be submitted.

FastAI has the convenient method called split_by_rand_pct to split a portion of your data into validation data.

It also has the method databunch to perform batch processing. I used 64 as the batch because that is what my GPU limits. If you don’t have GPU, emit the device parameter.

Then, the normalize method is called to normalize your images because you will use a pre-trained network. imagenet_stats will normalize the images according to how the pre-trained network was trained for the ImageNet competition.

Adding the test data to the training image list makes it easy to predict later on without more pre-processing. Remember, these images will not be trained on and will not go to your validation. You just want to pre-process the data in the same way with the training images.

The parameter size in transform method will be used to scale up or down the input to match with the neural network you will use. The network I will use is called DenseNet, which won Best Paper Award at ImageNet 2017, and it needs images with 128×128 pixels in size.

learn = cnn_learner(train_img, models.densenet161, metrics=[error_rate, accuracy])

You are done preparing your training data. Now, create a training method with cnn_learner. As I said before, I will use DenseNet as the pre-trained network. You can use another network offered in TorchVision.

The One-Cycle Technique

Image for post
Image for post

A bigger learning rate makes the training steps faster, but it is prone to overstepping the boundaries. This makes it possible for the error to go out of control like the picture above. While a smaller learning rate makes the training steps slower, but it will not go out of control.

So, choosing the right learning rate is really important. Make it big enough without going out of control.

It is easier said than done.

So, there came a person called Leslie Smith who create a technique called the 1-cycle policy.

Intuition wise, you might want to find / brute force several learning rates and find one with nearly minimal error but have some space to improve. Let’s try it out in our code.


It will print something like this:

Image for post
Image for post

The minimum should be 10 ⁻¹. So, I think we can use something smaller than that but not too small. Maybe 3 * 10 ⁻ ² is a good choice. Let’s try it!

lr = 3e-02 
learn.fit_one_cycle(5, slice(lr))

Train for several steps (I choose 5, not too big and not too small), and let’s see the result.

Image for post
Image for post

Wait, what!?

Our simple solution gives us 100% accuracy for our validation split! It is actually effective. And it only needs six minutes to train. What a stroke of luck! In real life, you will do several iterations just to find out which algorithms do better than the others.

I am eager to submit! Haha. Let’s predict the test folder and submit the result.

preds,_ = learn.get_preds(ds_type=DatasetType.Test) test_df.has_cactus = preds.numpy()[:, 0]

Because you have already put the test images in the training image list, you will not need to pre-process and load your test images.

test_df.to_csv('submission.csv', index=False)

This line will create a CSV file containing the images name and has a cactus column for all 4,000 test images.

When I tried to submit, I actually just realized that you need to submit the CSV via a Kaggle kernel. I missed that.

Image for post
Image for post
Courtesy of Kaggle

But, luckily, the kernel is actually the same as your jupyter notebook. You can just copy paste all the things you have built in your notebook and submit there.

And BAM!

Image for post
Image for post

Good Lord! I get 0.9999 for the public score. That’s really good. But, of course, I want to get a perfect score if my first attempt is like that.

So, I did several tweaks in the network and once more, BAM!

Image for post
Image for post

I did it! So can you. It’s actually not that hard.

(BTW, this rank was taken on April 13th, so I might drop my rank right by now…)

What I Learned

Alas, because many people get a perfect score on this, I think the admin needs to create another test set for submission. A harder one maybe.

Whatever the reason, there is no barrier for you to try this. You can try this right now and get good results.

Image for post
Image for post
Photo by Mario Mrad on Unsplash

Final Thoughts

Just get started, pick a good Convolutional Neural Network project, and get good data.

Good luck!

This article is originally published on my blog at thedatamage.


This is no longer updated.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store