Basic Image Classifier Project

Sampurn Anand
Feb 19 · 10 min read

Aim of this project is to:

  • Create a labelled dataset of Avengers images- Captain America, Iron Man, Black Widow, Hulk, Thor.
  • Train a CNN that is able to classify an unseen image with reasonable accuracy.

Basically, there are 4 main steps of data manipulation in an Image Classifier. These are:

  1. Data Collection
  2. Data Preprocessing
  3. Feature Extraction
  4. Model Training

Now, lets go thorough step by step implementation and various methods available. The method which is best suited will be adopted.

Note : We are using Google Colab in this project but one can use any of the available softwares.

Step : 1 — Data Collection

In this project we need to collect data in form of images. The Images can be obtained by manually scrapping single (static and/or dynamic) websites. But since a large amount of images is needed, so, many websites will have to be scrapped. Instead of going manually on different websites, one can scrap the Google Images or Bing Images.

There can be many methods for scrapping images. Three of which that are popularly used are as follows :

  1. Using the python and web scrapping tools to automate download of a certain type of images from google directly. This method is quite illegal, so, websites and Google itself make sure that these web crawlers don’t work. Which is the reason why the codes need to be updated on a regular basis.
  2. By using Chrome browser extensions like fatkun. These extensions are far more stable to use than the previous method. But as per the requirement of this project, images should be scrapped from internet.
  3. Using Python tools such as Bing Image Downloader to directly export required images to a directory.

In this project, for the sake of convenience, Bing Image Downloader is used.

First we will install the downloader and import the required libraries by: -

!pip install bing-image-downloader 
from bing_image_downloader import downloader

Now, it is used to Scrap images by using the following lines of codes: -'Captain America Chris Evans', output_dir= './drive/MyDrive/datasets/collection', limit = 400, adult_filter_off = False, force_replace = False, timeout = 6000)

Similarly, Images for Iron Man, Thor, Hulk and Black Widow are scrapped.

Step : 2 — Data Preprocessing

This Step is vey important to perform Image Classification. It increases the overall efficiency of the algorithm.

This step is required in this project because after data collection, it was observed that many unimportant images from Comic Books were collected. These types of images will lead to consumption of more resources of the system. Moreover, the classifier might make mistakes. So, those images should not be used directly.

In this project, OpenCV and a technique called haar cascades which are used for Data Cleaning purposes. They will detect if a face and two eyes are clearly visible or not. If they are visible then the image is kept otherwise the image is discarded. Majority of the data cleaning work will be done using python code but there will be some cleaning work that will have to be done manually. Manual checking of images is required to remove the unwanted faces. For example, in the folder for Iron Man, faces of other characters might appear which decreases the efficiency of the Model.

Steps for Data Cleaning :

  1. Faces with 2 eyes are extracted from Raw Images using Haar Cascade
  2. Manually photos with two or more faces are discarded. Also the photos which have blurred photos and other

Every Image has line and edge features. Haar Cascade uses a moving Window of this edge features to detect where are eyes and full face.

For example, to detect the eyes, the area of eyes tends to more darker than the area below. Haar Cascades use this mask to detect the areas.

OpenCV has readymade APIs to detect face, eyes, etc. 17 different xml files for running the APIs are uploaded manually for using the haar cascade functions to detect the various features.

Now, lets import the required libraries and make functions to use the face cascade and eye cascade features: -

import numpy as np
import cv2
import matplotlib
from matplotlib import pyplot as plt
%matplotlib inline

face_cascade = cv2.CascadeClassifier("./drive/MyDrive/Colab Notebooks/opencv/haarcascades/haarcascade_frontalface_default.xml")
eye_cascade = cv2.CascadeClassifier("./drive/MyDrive/Colab Notebooks/opencv/haarcascades/haarcascade_eye.xml")

Let’s have a trial run to see if the functions are working properly or not:

img = cv2.imread('/content/drive/My Drive/datasets/collection/Thor Chris Hemsworth/Image_8.jpg')
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
plt.imshow(gray, cmap='gray')

face_cascade = cv2.CascadeClassifier('./drive/MyDrive/Colab Notebooks/opencv/haarcascades/haarcascade_frontalface_default.xml')
eye_cascade = cv2.CascadeClassifier('./drive/MyDrive/Colab Notebooks/opencv/haarcascades/haarcascade_eye.xml')
faces = face_cascade.detectMultiScale(gray)

These lines of code checked if image is having a face and two eyes or not. If it didn’t have any faces or eyes then it will return an error. If the image had then it returned an output which is similar to output shown below: -

Image with Face and two Eyes

Now, Since the functions are working properly, let’s write code to check all the images in the dataset. The images which match the requirements will be converted to gray color and then cropped. These images are saved in a seprate folder for future use.

def get_cropped_image_if_2_eyes(image_path):
img = cv2.imread(image_path)
gray = cv2.cvtColor(img, cv2.COLOR_BGR2GRAY)
faces = face_cascade.detectMultiScale(gray, 1.3, 5)
for (x,y,w,h) in faces:
roi_gray = gray[y:y+h, x:x+w]
roi_color = img[y:y+h, x:x+w]
eyes = eye_cascade.detectMultiScale(roi_gray)
if len(eyes) >= 2:
return roi_color

path_to_data = "./drive/My Drive/datasets/collection/"
path_to_cr_data = "./drive/My Drive/datasets/cropped/"

import os
img_dirs = []
for entry in os.scandir(path_to_data):
if entry.is_dir():

import shutil
if os.path.exists(path_to_cr_data):

cropped_image_dirs = []
celebrity_file_names_dict = {}

for img_dir in img_dirs:
count = 1
celebrity_name = img_dir.split('/')[-1]

celebrity_file_names_dict[celebrity_name] = []

for entry in os.scandir(img_dir):
roi_color = get_cropped_image_if_2_eyes(entry.path)
if roi_color is not None:
cropped_folder = path_to_cr_data + celebrity_name
if not os.path.exists(cropped_folder):
print("Generating cropped images in folder: ",cropped_folder) #Checking whether the code is running successfully or not

cropped_file_name = celebrity_name + str(count) + ".png" #changing file type of every image to png
cropped_file_path = cropped_folder + "/" + cropped_file_name

cv2.imwrite(cropped_file_path, roi_color)
count += 1

Step: 3 — Feature Extraction using Wavelet Transform

Importance of this step is that in Feature Extraction, colored images cause many errors. The Colored images can have a variety of shades and variety of colors which makes it a difficult task for the classifier to identify such an image.

To avoid those errors, the images are transformed in black and white colors with different contrasts for different areas. Wavelet transformation allows extraction of the important features from image. In general, in the wavelet transformed image, the area of eyes will be differentiated from the area of forehead, nose will also be distinct and so on.

While going through the image processing literature, it was found out that wavelet transforms are often the most effective way of extracting. So, Wavelet transformation is being used in this project.

After inputting the image, it will perform the wavelet transformation on top of it using PYWT (pi wavelet transform library) and it will return your new image which is the wavelet transform. Concepts on signal processing, frequency domain, time domain, Fourier transformation has been used to apply the Wavelet transformation in main Codes. A few of these concepts are explained below briefly:

Any signal, like an audio signal, image can also be considered as a signal. It can be presented in two type of domain. So, image can be presented in a spatial domain like space (x and y) or it can be represented as a frequency domain. Audio signal can be represented in a time domain or a frequency domain.

Fourier transformation will take a complex signal and will return the basic signals which makes that complex signal. For Example, let’s consider some dish, let’s say Dosa. If reverse engineering is done on Dosa, the basic ingredients are obtained which are water, rice flour, urad dal and maybe more.

Similar case is with a complex signal where there are different instruments playing in and there is also noise. There are many noise cancellation devices so how do they actually cancel the noise? That is something done using Fourier transformation because it can separate out the voice of the vocal cord and the noise. It can separate out all these signals into different frequencies and using the frequency filters some frequencies can be suppressed or it can be inter amplified. Certain frequencies in certain audio devices, treble or bass can be increased. All of this is possible because of Fourier transformation.

Wavelet transformation is kind of similar to Fourier transformation which amplifies certain features of the image.

For the further steps, input will be a vertically stacked Color image and its wavelet transformed image. Code for which is as follows:

import numpy as np 
import pywt
import cv2
def w2d(img, mode='haar', level=1):
imArray = img
#Datatype conversions
#convert to grayscale
imArray = cv2.cvtColor( imArray,cv2.COLOR_RGB2GRAY )
#convert to float
imArray = np.float32(imArray)
imArray /= 255;
# compute coefficients
coeffs=pywt.wavedec2(imArray, mode, level=level)
#Process Coefficients
coeffs_H[0] *= 0;
# reconstruction
imArray_H=pywt.waverec2(coeffs_H, mode);
imArray_H *= 255;
imArray_H = np.uint8(imArray_H)
return imArray_H

Lets assign a number (or Key) to each of the 5 Characters.

class_dict = {}
count = 0
for celebrity_name in celebrity_file_names_dict.keys():
class_dict[celebrity_name] = count
count = count + 1

{ ‘Black Widow Scarlett Johansson’: 2,
‘Captain America Chris Evans’: 0,
‘Hulk Mark Ruffalo’: 3,
‘Iron Man Tony Stark’: 1,
‘Thor Chris Hemsworth’: 4 }

Creating a dictionary to refer to path of all the cropped images of the respective characters:

celebrity_file_names_dict = {}
for img_dir in cropped_image_dirs:
celebrity_name = img_dir.split('/')[-1]
file_list = []
for entry in os.scandir(img_dir):
celebrity_file_names_dict[celebrity_name] = file_list

Now, lets create a dictionary where the colored images are stacked vertically with their Wavelet transformed ones for future use.

X, y = [], []
for celebrity_name, training_files in celebrity_file_names_dict.items():
for training_image in training_files:
img = cv2.imread(training_image)
if img is None:
scalled_raw_img = cv2.resize(img, (32, 32)) #resizing using openCV as images maybe of different sizes
img_har = w2d(img,'db1',5) #getting the wavelet transformed image
scalled_img_har = cv2.resize(img_har, (32, 32)) #resizing wavelet transformed image
combined_img = np.vstack((scalled_raw_img.reshape(32*32*3,1),scalled_img_har.reshape(32*32,1))) #vertically stacking both the images
X = np.array(X).reshape(len(X),4096).astype(float)

Step 4 — Model Training: Using SVM with heuristic finetuning

In this Project, at first SVM is used initially to train the main model.

Then other models are tested using GridSearch to decide which model is the best fit for the project.

The GridSearch CV is used for Hypertuning parameters. It helps in deciding which model is performing the best.

In our project, we are defining the candidate models as follows for comparisons:

  1. SVM with parameters as — Values of C are 1,10,100,1000 and Kernel Values are rbf and linear.
  2. Random Forest with parameters as — Number of estimators (or Decision Trees) as 1,5,10.
  3. Logistic Regression with parameters as — Values of C are 1,5,10.

Finally the best model is stored in the “Trained Model.pkl” and the class dictionary is also saved.

Code for training SVM:

from sklearn.svm import SVC
from sklearn.preprocessing import StandardScaler
from sklearn.model_selection import train_test_split
from sklearn.pipeline import Pipeline
from sklearn.metrics import classification_report
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=0)

#Pipeline is created to scale the Data.
pipe = Pipeline([('scaler', StandardScaler()), ('svc', SVC(kernel = 'rbf', C = 10))]),y_train)

This gave an score of 0.9770992366412213

Now, let’s get a complete classification report.

print(classification_report(y_test, pipe.predict(X_test)))

This gave the following Output:

Classification Report using SVM

Training and testing the other models as mentioned earlier:

from sklearn import svm
from sklearn.ensemble import RandomForestClassifier
from sklearn.linear_model import LogisticRegression
from sklearn.pipeline import make_pipeline
from sklearn.model_selection import GridSearchCV
model_params = {
'svm': {
'model': svm.SVC(gamma='auto',probability=True),
'params' : {
'svc__C': [1,10,100,1000],
'svc__kernel': ['rbf','linear']
'random_forest': {
'model': RandomForestClassifier(),
'params' : {
'randomforestclassifier__n_estimators': [1,5,10]
'logistic_regression' : {
'model': LogisticRegression(solver='liblinear',multi_class='auto'),
'params': {
'logisticregression__C': [1,5,10]
scores = []
best_estimators = {}
import pandas as pd
for algo, mp in model_params.items():
pipe = make_pipeline(StandardScaler(), mp['model'])
clf = GridSearchCV(pipe, mp['params'], cv=5, return_train_score=False)
# cv=5 => There will be 5 folds of testing the model and
# then will avereage out the scores, y_train)
# Scores are appended and a data frame is created from it
'model': algo,
'best_score': clf.best_score_,
'best_params': clf.best_params_
best_estimators[algo] = clf.best_estimator_

df = pd.DataFrame(scores,columns=['model','best_score','best_params'])

The Final report of scores obtained is as follows:

Scores of the three tested models

These scores were on the Training data. Now, let’s get the scores for the models on Testing Data:


The scores obtained were as follows:

SVM: 0.9770992366412213
Random Forest: 0.9389312977099237
Logistic Regression: 0.9847328244274809

As noticed SVM, is performing good with both — Training Data and Testing Data, so, SVMs will be used in this project.

best_clf = best_estimators['svm']

Now drawing the Confusion Matrix:

from sklearn.metrics import confusion_matrix
cm = confusion_matrix(y_test, best_clf.predict(X_test))
import seaborn as sn
plt.figure(figsize = (10,7))
sn.heatmap(cm, annot=True)
Confusion Matrix

Saving the model in the respective pkl file for future use in making Web Apps etc.

!pip install joblib
import joblib
# Save the model as a pickle in a file
joblib.dump(best_clf, 'saved_model.pkl')

Saving the Dictionary as Json file for future use:

import json
with open("class_dictionary.json","w") as f:

A model has been Successfully made which now can be further use to make Websites.

Link for the Complete Code is : Github

Thank You for spending your valuable time in reading this article. Do let me know your views and suggestions in comments.

Nerd For Tech

From Confusion to Clarification

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit Don’t forget to check out Ask-NFT, a mentorship ecosystem we’ve started

Sampurn Anand

Written by

A Pre Final Year Student at NITT with interest in Machine Learning and many other hobbies...

Nerd For Tech

NFT is an Educational Media House. Our mission is to bring the invaluable knowledge and experiences of experts from all over the world to the novice. To know more about us, visit Don’t forget to check out Ask-NFT, a mentorship ecosystem we’ve started

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store