Functioning of CNN with custom dataset.

Ajay John Alex
Analytics Vidhya
Published in
9 min readJan 3, 2020

You would find many articles and videos on Youtube about the functioning of CNN . The idea behind writing this article is to shift from the normal norm and share some additional information along with the existing information.
So in this attempt functioning of Convolutional Neural Network on a Custom Dataset is explained . The article is written in the form of question and answers to cover all the related topics and common questions regarding this topic.

You can use any use any language Python or R, or can go for any library like Tensorflow , TFlearn or keras etc .. it actually doesn’t matter as long as you are clear with the concept.

The purpose of this article is to teach as to how you could create your own data and apply CNN on them using TFlearn and I ran this code on Google Colab.

By definition : TFlearn is a modular and transparent deep learning library built on top of Tensorflow. It was designed to provide a higher-level API to TensorFlow in order to facilitate and speed-up experimentations, while remaining fully transparent and compatible with it.

Q. Why CNN ?

A. CNN is Convolutional Neural Network and is usually used for image recognition . The practical benefit is that having fewer parameters greatly improves the time it takes to learn as well as reduces the amount of data required to train the model. Instead of a fully connected network of weights from each pixel, a CNN has just enough weights to look at a small patch of the image.

Q. But then why go for custom data ?

A.There would be thousands of articles on MNIST dataset but then in these preprocessed dataset you don’t actually know how to extract new images and create a dataset on your own , resizing the images,ordering the images and labelling them .
Install google_images_download to download custom images to our choices. Enter this into the cmd .

! pip install google_images_download

CONVOLUTIONAL NEURAL NETWORK

CNN LAYER

This layer helps us to detect the features in an image . As shown in the first image that there is a 2*2 filter moving at a stride of 1. The filter is multiplied with the input image to get an output image .

Q. But what would these filters do ?
A.
Now each of these filters are actually a feature detector . For example in the below images you can see that each filter is detecting a different feature . To understand this a bit more better if your image was a “CAT”, then maybe one feature detector filter detects eyes and another a nose and another ears and so on….
Similarly in this image below each filter searches and detects a feature and we get a feature map. And finally after using different filters we have collection of feature maps that makes our convolutional layer.
Now as to how understand the feature detection process, this video by Andrew Ng is the best you would find.

Q.Why is ReLU used as an activation function ?
A . Well we go for ReLU in as the activation function to increase the non linearity. Then the question as to why is non linearity comes into mind .
Well ReLU is Rectified Linear Unit and its defined as y = max(0, x) where x is the input to a neuron.

Images themselves are highly linear but after the convolution the linearity is reduced and in order to increase the linearity of images we use ReLU. Now what do you mean by non linearity ? Well when transition from one pixel to another happens there is non linearity because of color, shapes,borders and different elements.

Here we have a feature map from one filter and its in black and white , now after applying ReLU we have just only non-negative values ie all black coloration is removed . Now for the pixel transition in the feature map for lets from the black colored area to white area is linear ie first its black then dark greyish , then greyish and then white .But on applying the ReLU we have a sharp contrast in color and hence increases non linearity .

POOLING LAYER

Pooling layer is used to find the maximum in a the matrix . The usual stride taken is 2 and usual filter size is 2.

Q. But what does this max pooling do ?
A.
Max pooling is done to get maximum in a in a pool . Now this step is done after convolution layer and in convolution we detect the features . So lets ,take an example to get a better understanding . If the image was of the cat then maybe one of the feature detected by convolution layer could be eyes, now these eyes can be located at any position in an image , some images my have just a face of a cat , some might have an entire body , some maybe a side view and so on … but our CNN should identify all as ‘CATS’. So what pooling does is that it helps in identifying the features even if they are slightly distorted .And by a 2*2 filter we are reducing the size and parameters by 75%. Thus this prevents overfitting .

Q. How does it achieve the aim of handling distortion in features?
A.
When a filter moves with a size of 2*2 and a stride of 2 . It scans and takes the maximum value from that group of 2*2 thus ensuring that the main feature from all groups are taken and thus and thus the spatial distortion is handled . Just an intuitive example , number 9 shows us the ears of a cat and its located at 2nd row 1st column ,now if the image was distorted and the 9 happens to have moved up or right then after pooling we would still have that feature restored with Max Pooling. Don’t take this as a literal explanation but as an intuitive example to understand the concept of pooling .

For curious minds….
Q. One interesting doubt that might come is to why just go for Max Pooling and not any other type of pooling like average pooling ?
A. Please refer this research paper by Dominik Scherer, Andreas Muller and Sven Behnke. It’s just a 10 page research paper that explains this topic deeply.
Also check this site for a fun experience of CNN functionality.

CODE :

import tflearn
from
tflearn.layers.core import input_data, fully_connected, dropout from tflearn.layers.conv import conv_2d,max_pool_2d from tflearn.layers.estimator import regression import numpy as np import matplotlib.pyplot as plt
import cv2
import os
from random import shuffle
from google_images_download import google_images_download
from PIL import Image

Here :
Keywords : Name of the objects whose images you need to download.
Limit : No of images you want to download at once .
Print_urls : Print the url of all images being downloaded.

The limit was kept 100 here and we got 94 images because some images would be corrupted .Refer this page for better clarification on the various parameters and examples .

# getting random images of Forest Fire and Natural Vegetation in
response = google_images_download.googleimagesdownload()
arguments = {"keywords":"Forest Fire,Natural Vegetation","limit":100,"print_urls":False}

path = response.download(arguments)
print(path)

# got 94 images of forest_fire and 94 images of natural vegetation

The above code ensures that the downloaded images are not corrupted. As without this later on it creates a lot of problem in resizing and converting the images.

# removing corrupted images

FOREST_FIRE_DIR = '/content/downloads/Forest Fire'
NATURAL_VEG_DIR = '/content/downloads/Natural Vegetation'

url_list = [FOREST_FIRE_DIR,NATURAL_VEG_DIR]
for i in url_list :

for image in os.listdir(i):

try:
with Image.open(i+"/"+image) as im :
pass

except:
print(i+"/"+image)
os.remove(i+"/"+image)

Now here we rename the existing images . It is done to add labels to the 2 group of images on which we would perform the classification using CNN.The labelling part is explained later on.

# renaming the files
for i in url_list:

for num , image in enumerate(os.listdir(i)):

if i == '/content/downloads/Forest Fire':
os.rename(i+"/"+image,i+"/"+"forest_fire."+str(num)+".jpg")
else:
os.rename(i+"/"+image,i+"/"+"natural_veg."+str(num)+".jpg")

If you are not using Google Colab you might skip these line of code .
Google colab creates checkpoints that often brings problems so by this code that issue is resolved.

# removing corrupted images  
FOREST_FIRE_DIR = '/content/downloads/Forest Fire'
NATURAL_VEG_DIR = '/content/downloads/Natural Vegetation'
url_list = [FOREST_FIRE_DIR,NATURAL_VEG_DIR]
for i in url_list :

for image in os.listdir(i):

try:
with Image.open(i+"/"+image) as im :
# print(i+"/"+image)
pass

except:
print(i+"/"+image)
if image == '.ipynb_checkpoints':
pass
else:
os.remove(i+"/"+image)
# getting the count of the no of images available under each category

print("forest fire image count: "+ str(len([x for x in os.listdir(FOREST_FIRE_DIR)])))

print("natural vegetation image count: "+ str(len([x for x in os.listdir(NATURAL_VEG_DIR)])))

Labelling of the images as [1,0] if its name starts with forest_fire else [0,1].
Here the earlier renaming of images helps.

from tqdm import tqdm

def label_img(img):

word_label = img.split('.')[0]
if word_label == "forest_fire":
return [1,0]

else:
return [0,1]

We now need a train set and test from the existing dataset.
I’ll break down what is happening in these lines of code .
Steps are same for both sets.

1. Reading the images from the files :

train_url = [TRAIN_DIR_Fire,TRAIN_DIR_Nature]
for i in train_url:
for image in tqdm(os.listdir(i)):
label = label_img(image)
path = os.path.join(i,image)

2. Here we read the image and resize it to image size , this image size would be defined later on .
3. Then both the image and label are appended to a numpy array one by one
4. Your data is shuffled to change the order of the images

else:
image = cv2.resize(cv2.imread(path),(IMG_SIZE,IMG_SIZE))
training_data.append([ np.array(image),np.array(label)])
shuffle(training_data)
np.save('training_data.npy',training_data)

'''TRAIN SET '''
def
create_train_set():
training_data = []
TRAIN_DIR_Fire = '/content/downloads/Forest Fire'
TRAIN_DIR_Nature = '/content/downloads/Natural Vegetation'

train_url = [TRAIN_DIR_Fire,TRAIN_DIR_Nature]
for i in train_url:
for image in tqdm(os.listdir(i)):
label = label_img(image)
path = os.path.join(i,image)
if path in ["/content/downloads_img/ForestFire/.ipynb_checkpoints"]:
pass
else:

image = cv2.resize(cv2.imread(path),(IMG_SIZE,IMG_SIZE))

training_data.append([ np.array(image),np.array(label)])

shuffle(training_data)
np.save('training_data.npy',training_data)
return training_data
'''TEST SET'''def create_test_set():
testing_data = []
TEST_DIR_Fire = '/content/test/Forest_Fire'
TEST_DIR_Nature = '/content/test/Natural_Vegetation'
test_url = [TEST_DIR_Fire,TEST_DIR_Nature]
for i in test_url:

for image in tqdm(os.listdir(i)):
label = label_img(image)
path = os.path.join(i,image)
if path in ["/content/downloads_img/Forest Fire/.ipynb_checkpoints"]:

pass
else:

image = cv2.resize(cv2.imread(path),(IMG_SIZE,IMG_SIZE))
testing_data.append([ np.array(image),np.array(label)])

np.save('testing_data.npy',testing_data)
return testing_data

Here we declare the Image size , learning rate and no of epochs , feel free to experiment this. We now create the train and test set.

The reason why this article focused on Custom dataset is because in most of the examples CNN is done in MNIST or Fashion MNIST dataset . The problem in that is all the above we preprocessing we did till now is already done and ready for us and we don’t have any knowledge to handle for a real life projects. In real life projects we need to :
1. Extract custom data
2. Clean images and separate different images to folders .
3. Resize and rename then
4. Label the images
5. Convert the images to Numpy array’s.

All these above steps are done for us in these existing datasets.

IMG_SIZE = 50
learning_rate = 1e-3
N_EPOCH = 5

MODEL_NAME = "fireVSnature-{}-{}-{}.model".format(learning_rate,'6-conv-basic',N_EPOCH)
train_data = create_train_set()
test_data = create_test_set()

We build our CNN using tflearn in this piece of Code. We have 2 Convolutional and MaxPool layer with 2 fully connected layer , the optimiser used is ‘adam’, metric for classification is ‘accuracy’.

convnet = input_data([None,IMG_SIZE,IMG_SIZE,3],name='inputs')

convnet = conv_2d(convnet,64,3,activation='relu')
convent = max_pool_2d(convnet,2)

convnet = conv_2d(convnet,32,3,activation='relu')
convent = max_pool_2d(convnet,2)

convnet = fully_connected(convnet,512,activation='relu')

convnet = fully_connected(convnet,2,activation='sigmoid')
convnet = regression(convnet,optimizer='adam',name='targets',learning_rate=learning_rate,loss='binary_crossentropy',metric = 'accuracy')
model = tflearn.DNN(convnet,tensorboard_dir='log')

Getting the images and labels from test and train data .

X = np.array( [ i[0] for i in train_data ])
y = np.array([i[1] for i in train_data])

test_X = np.array( [ i[0] for i in test_data ])
test_y = np.array([i[1] for i in test_data])

Fitting the model .

model.fit({'inputs':X},{'targets':y},n_epoch=3,validation_set=({'inputs':test_X},'targets':test_y}),show_metric=True,snapshot_step=10,run_id=MODEL_NAME,batch_size=10)
model.save(MODEL_NAME)

Predicting the classification and visualising the results . If you have less no of images as I did (less than 100 images ) then your accuracy wouldn’t be much .

%matplotlib inline
fig = plt.figure(figsize=(10,6))
import matplotlib.image as mpimg
for num, image in enumerate(test_data):
# image = mpimg.imread(image)
y_plot = fig.add_subplot(3,6,num+1)
model_out = model.predict([image[0]])[0][0]
# print(model_out)
if model_out == 1:
label = "FOREST_FIRE"
else:
label="NATURAL_VEG"
y_plot.imshow(image[0])
plt.title(label)
y_plot.axis('off')
plt.show()

For checking on the complete code. Refer this page.

If there are any queries regarding this article, please do add them in the comments section. I would love to answer them as soon as possible. I would also be making sufficient changes in the article accordingly.

--

--

Ajay John Alex
Analytics Vidhya

I am a results-driven Data Scientist with over 5 years of experience, specializing in leveraging machine learning models to derive actionable insights.