First Dive In Deep Learning or Transfer Learning: Aerial Cactus Identification

Determine whether an image contains columnar cactus.

Published in

Analytics Vidhya

10 min readAug 26, 2019

Enough of tabular data, now it was time to mess around with image data. As usual, I was exploring some of the past competitions on Kaggle but wait a second, the majority of them are mainly concerned with image data. How to solve this? Well the answer was simple — I had to grasp the basics of Deep Learning!

For starters I started with a course Kaggle has to offer, you can get it here. After some brainstorming, I was ready for my first mission! But well it had to be simple and basic one as I was still a newbie and then I encountered this dataset.

Aerial Cactus Identification dataset consisted of 17500 images with a label depicting whether the image has cactus or not. So what are we waiting for let’s start coding!

Import Relevant Libraries

import cv2
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
import os
from tqdm import tqdm, tqdm_notebook
from keras.models import Sequential
from keras.layers import Activation, Dropout, Flatten, Dense
from keras.applications import VGG16
from keras.optimizers import Adam

Don’t worry if you don’t understand some of the terms for now, by the end of this article I will make sure every topic is being covered.

Let’s load the data

train_df = pd.read_csv('../input/aerial-cactus-
identification/train.csv')

Well, we have loaded the data in our dataframe, let’s have a look at the structure of train_df.

It’s not what we had expected right? We wanted to see some visuals.

Why not have them, here is the code :

im = cv2.imread("../input/aerial-cactus-identification/train/train/01e30c0ba6e91343a12d2126fcafc0dd.jpg")
plt.imshow(im)

I know the is image is very blurred, but had to resize it for display.

Fun Part Begins!

We have our data loaded in variables and now we have to create a CNN(Convolutional Neural Network), sounds so complex but trust it will be a cakewalk once you will get it as the same happened with me.

Let’s split the word CNN into two parts: Convolution and Neural Network

That’s it! Just have a look at the image and you will get it straight in your head. For those who are slightly slow in grasping like me here is the explanation: What we are doing is multiplying the whole 5X5 matrix with a 3X3 matrix and converting it in another layer which is of lower dimension than the previous one and this chain goes on! This is extremely useful in large matrices.

Now coming to our later part which was Neural Network so here it is :

Hell yeah! It’s nothing but applying convolution again and again or in more technical terms adding layer one above others just like a stack and making a network among different nodes.

Now that we are somewhat familiar with the term CNN, what are we waiting for let’s create one! But you know what, I am going to give you all a bonus. What if besides from making a new model all on ourselves we also use a ‘pretrained’ model which was already trained for a very large dataset!

This was a beauty which I discovered while learning about Deep Learning or DL , which is if you are a lazy creature(just like me) who don’t like to mess around and get in trouble of actually making a new model but what to somewhat use other’s model for our own purpose this is the right place and the concept is named as “Transfer Learning”

I will walk you through the concept of Transfer Learning while coding itself and you are going to love it, so let’s start to code!

vgg16_net = VGG16(weights='imagenet', 
                  include_top=False, 
                  input_shape=(32, 32, 3))

As mentioned earlier, the name of our pretrained model is VGG16 (sounds like a movie title!)It is actually a network consisting of 16 layers. Here is it’s structure

Instead of coding this manually it’s better to use one, right?

But what if this model was created for some other purpose, I mean we want this model to classify if a picture has a cactus in it or not but we don’t even know what this model does. To solve this problem what we can do is transfer and concat this pretrained model to our self-made model and that is why it is called as Transfer Learning!

Syntax of our pretrained model:

So in the code mentioned above, there were 3 parameters (weight, include, input_shape). So let’s clear it order wise:

Weight: It’s basically a blessing or a curse, I mean it’s somewhat a bias value which will multiply the input with a certain number and pass it to the next layer so if the node is of more value than naturally, its weight would be higher.
include_top = False: As mentioned in the theory above we are going to concat out the pretrained model in our new model and for that, we need to remove it’s (vgg16) top layer which was supposed to predict the output. It’s very important for you to understand this, so here is a video link.
input_shape: You can know this just by the command img.shape

Heading forward :

vgg16_net.trainable = True
model = Sequential()     #Starting Point
model.add(vgg16_net)
model.add(Flatten())
model.add(Dense(256))
model.add(Activation('relu'))
model.add(Dropout(0.5))
model.add(Dense(1))
model.add(Activation('sigmoid'))

So many new terminologies, let’s understand them one by one.
Overall this is the concatenation I was talking about, let’s start from the ‘Starting Point’, we have created a model which will be an object of Sequential(), Sequential is nothing but helps us to stack as many layers we want just like a sandwich!

Layers of our model

Now that we have our model ready let’s start adding some layers :
1st layer: Our pretrained model
2nd layer: Flatten(), what on earth does this mean?

Flatten() in a nutshell, for those who still struggling here is the official definition :

Flatten is the function that converts the pooled feature map to a single column that is passed to the fully connected layer. Dense adds the fully connected layer to the neural network.

3rd Layer: Dense(256), A dense layer represents a matrix-vector multiplication in other words what it actually means is during the training process there may happen that the previous values in the matrix were inaccurate so it might need to change it, this process is also called ‘Backpropagation’. Meaning of syntax will generate an output of 256 nodes.

3rd Layer (activation function) :

Working of Activation Function: Specifically, these functions do the sum of products of inputs(X) and their corresponding weights(W) and apply an Activation function f(x) to it to get the output of that layer and feed it as an input to the next layer.

What is Relu?

Relu or (rectified linear unit (ReLU)) is widely used to remove all the negatively affected nodes from our model. As seen from the figure it returns 0 if the weight is less than 0 and as we read in the theory just above, all the input is multiplied by its weight and if weight is 0 means that input has no significance.

4th Layer (Dropout) :

As the figure suggests it is a technique to reduce overfitting and improve regularization by dropping some random nodes during the training process.

Drop(0.2) means that the current layer will drop 20 percent of nodes randomly from the previously added layer.

4th Layer: Well I am not going deep inside this layer as already mentioned working of Dense layer, in a nutshell, this layer will give the output a single node which will have our answer, now what will be the form of that answer will be decided by our activation function or sigmoid.

4th Layer (activation function) :

Conventionally besides diagram is the typical example of the sigmoid function.However, a more simple and practical approach would be it would simply give the probability of your how likely is your output.

Well, congratulations you have reached the end of all layers. Now you have a basic idea of all the layers, activation function, and pretrained model or transfer learning as well (that’s quite a few things for a beginner, you are awesome!).

Compiling our model

model.compile(loss='binary_crossentropy',
              optimizer=Adam(lr=1e-5), 
              metrics=['accuracy'])

We are ready to compile our model. This will create a Python object which will build the CNN. This is done by building the computation graph in the correct format based on the Keras backend we are using.

Compile function consists of 3 arguments (loss, optimizer, and metrics).

Loss: As the name suggests this function means how well our algorithms is near the actual solution, if predictions deviate too much from actual results, loss function would cough up a very large number. Binary cross-entropy is used especially for binary classification where our main task is to predict Yes/No.

Binary Cross Entropy Math. ŷ is the predicted value and y is our actual value.

Optimizer: Optimizing functions are majorly used for minimizing or maximizing the loss function also referred to as E(x). Here we have used Adam( Adaptive Moment Estimation). Adam works well in practice and compares favorably to other adaptive learning-method algorithms as it converges very fast and the learning speed of the Model is quiet Fast and efficient. I referred to this link for a better understanding.

Let’s move on :

Now that we have created our model, let’s train it. Before that, we need to separate labels with features in other words images with their labels. Here’s the code

X_tr = []
Y_tr = []
imges = train_df['id'].values
for img_id in tqdm_notebook(imges):
    X_tr.append(cv2.imread(train_dir + img_id))    
    Y_tr.append(train_df[train_df['id'] == img_id]['has_cactus'].values[0])  #Cooking our food!
X_tr = np.asarray(X_tr)
X_tr = X_tr.astype('float32')
X_tr /= 255
Y_tr = np.asarray(Y_tr)

Whoa! So many gibberish words. Let’s understand it step-wise.

Understanding from the starting point, what we have done is got the list of all the images name in a list named ‘imges’. Here is the output

The model can’t be trained just by the image name right?

We need to retrieve the image and how do we do that? We can simply iterate through our list named imges.

You may have noticed a new term there known as ‘tqdm_notebook’. Its main functionality to give a fancy progress bar indicating how much the process is being completed just like this :

What we are doing inside the iteration is pretty much straightforward, just concatenating our image directory with the name of the image and storing and reading it though cv2.imread() function. Also, storing values or labels in Y_tr (whether an image has cactus or not).

So now have collected our raw food for our model, it’s just that our model likes cooked food so let’s cook it and make it edible for our model. Here we go!

For starters, we need to convert our images to an array and for that, we are using np.asarray() functionality. After we are done with it let’s come to our main meal. What I mean is we can’t just give our model a list of a random numbers we need to preprocess it.

For that we have converted elements of our list into float type and divided it by 255 as the range of all the pixels are between 0–255 so our list we will standardize and ready to be served!!

Training Phase Begins :

batch_size = 32
nb_epoch = 10
model.fit(X_tr, Y_tr,
              batch_size=batch_size,
              epochs=nb_epoch,
              validation_split=0.1,
              shuffle=True,
              verbose=2)

Though it’s a single command there is a lot of things to understand and keep in mind while training a model. Just imagine how our mind works every second so smoothly and efficiently(Not mine though : D, just joking!). Let's start understanding this beauty.

To understand what is batch_size and epoch, just remember a relation between them. So the number of an epochs is how many times our whole data is going to pass through the network(forward processing and backpropagation as well). Naturally, this would take lots of our CPU or rather GPU resource. For that purpose, we have batch_size to split the data into batches and then pass it further.

Training is finished and we got a pretty good accuracy too, I hope we are not overfitting!

Testing Phase rather Final Phase!

So finally we have reached the end of this article. I hope you have understood each and every step we have done so far. For this phase we are just going to repeat the process we did for preprocessing of the training dataset(Remember that raw food and cooked foodstuff?)

X_tst = []
Test_imgs = []
for img_id in tqdm_notebook(os.listdir(test_dir)):
    X_tst.append(cv2.imread(test_dir + img_id))     
    Test_imgs.append(img_id)
X_tst = np.asarray(X_tst)
X_tst = X_tst.astype('float32')
X_tst /= 255

There we go :

test_predictions = model.predict(X_tst)

This is what we got as prediction, but according to competition, we need to have a binary classification.

No need to worry we can just map its value and replace all the value with 0 having prediction less than 0.75 and 1 for above!

sub_df = pd.DataFrame(test_predictions, columns=['has_cactus'])
sub_df['has_cactus'] = sub_df['has_cactus'].apply(lambda x: 1 if x > 0.75 else 0)

Better right? Now the last step which is to replace the id with the image names and submit it to check our results!

#Preprocessing stuff for valid submission in kaggle
sub_df['id'] = ''
cols = sub_df.columns.tolist()
cols = cols[-1:] + cols[:-1]
sub_df=sub_df[cols]
for i, img in enumerate(Test_imgs):
    sub_df.set_value(i,'id',img)
sub_df.head()

Fingers crossed, let’s check the result!

sub_df.to_csv('submission.csv',index=False)

Well, that’s it for now, I hope you understood every single line of this code. Also thanks to some of the awesome kernels and notebooks on Kaggle for providing guidance to a learner. As usual, I am always open to suggestions and improvements.