Keras CNN Tutorial: Classifying Images Made Easy

PrathamModi
15 min readOct 16, 2023

--

So you want to build a dog breed classifier? Cool, you’ve come to the right place. In this tutorial, we’re going to use convolutional neural networks (CNNs) to classify images of 120 different breeds of dogs. By the end, you’ll have built and trained your own CNN that can identify almost any breed of dog.

CNNs are one of the most powerful tools in deep learning for image classification. They’re inspired by how our own visual system works and can detect patterns to distinguish between thousands of classes. We’ll be using a dataset of over 10,000 images of dogs from Kaggle to train our network.

This tutorial does assume you have a basic understanding of deep learning and Python. If you’re interested in learning more about CNN’s and its working in detail, do check out this blog by me ! (a shameless plugin)

But don’t worry, we’ll go step-by-step through building and training the network. And of course, there will be lots of pictures of cute dogs along the way! So grab your laptop, your favorite caffeinated beverage, and let’s get started building our dog breed classifier.

Introduction to Image Classification and CNNs

So you want to teach a computer to recognize different dog breeds. This is known as image classification, and we’ll use a convolutional neural network or CNN to do it.

A CNN is a type of deep learning model inspired by the human visual cortex. It scans an image, learns the patterns, and uses those patterns to classify the image. To build our CNN, we’ll need a dataset of labeled images. Luckily,Kaggle’s Dataset has over 10,000 images for training and 10,000+ images for testing the 120 breeds.

We’ll split this into training, validation, and test sets. The training set teaches our model, the validation set tweaks it, and the test set evaluates it. We’ll use data augmentation, like flipping and rotating, to increase our data.

Then we design our CNN. We add fully connected layers for our 120 breeds and compile it with an Adam optimizer and categorical cross entropy loss function.

Finally, we train for at least 10–15 epochs, watching our validation accuracy. We want it above 95% before testing. If we did it right, our CNN can take an image of any dog and predict its breed with high accuracy!

Pretty cool, right? With some data, and the right training, you’ll have a puppy prediction pro in no time. Let’s get started!

Dog Breed Image Dataset

To get started, you’ll need to gather images of different dog breeds. For this tutorial, we’ll use a dataset of 10,000+ images representing 120 breeds.

Head to Kaggle and download the Multiclass Dog Breed Dataset.

Now you have the dataset ready for building and training a CNN! The large number of images and breeds will allow your model to learn distinctive features of each breed.

With some tweaking of hyperparameters, this dataset can be used to build a highly accurate model. In the next section, we’ll cover how to build the CNN architecture. Stay tuned!

Features of the dataset

Some information about the data:

  • Data is images, so best to use deep learning/ transfer learning.
  • There are 120 Different breeds
  • There are 10,000+ images in Training Set (have labels)
  • There are 10,000+ images in Test Set (don’t have labels since we want to predict that)

Imports

import tensorflow as tf
import tensorflow_hub as hub
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
%matplotlib inline

from sklearn.model_selection import train_test_split

Since we are going to train the model on 10,000+ images, it is recommended to use a GPU, and if you don’t own one its alright! Go to Google Collab. It provides with monthly usage of GPU and TPU! Neat stuff huh?

# To check if Tensorflow is using the GPU or not
print(tf.__version__)
print("GPU", "Available" if tf.config.list_physical_devices('GPU') else "Not Found")

1. Ready Data

Load the dataset

pd.set_option('display.max_columns', None)  # Show all columns
pd.set_option('display.max_rows', None) # Show all rows

labels_csv = pd.read_csv("./data/labels.csv")
labels_csv.describe()
There are 10222 unique images, with 120 Different ‘breed’

Breed Distribution

# Plot the bar chart
ax = labels_csv["breed"].value_counts().plot.bar(figsize=(20, 10))

# Calculate the average value
average_value = labels_csv["breed"].value_counts().mean()

# Add the average line
ax.axhline(average_value, color='red', linestyle='--', label='Average')

# Customize the plot as desired
plt.xlabel('Breed')
plt.ylabel('Count')
plt.title('Breed Distribution')
plt.legend();

Getting images and their labels

  • A list a filepaths to training images
  • An array of all labels
  • An array of all unique labels
from IPython.display import Image

# Path to the image file
for image in labels_csv["id"].head(1):
display(Image(filename="./data/train/"+image+".jpg", width=300, height=200))
First Dog Image in the training dataset

Get Filenames to all Images

# So get the filenames of all using a similar for loop: AND STORE ITIN AN ARRAY
filenames = []
for image_id in labels_csv["id"]:
filenames.append("./data/train/"+image_id+".jpg")

filenames
FileNames

Lets check whether filenames to all the images have been acquired or not

# Check whether number of filenames matches number of actual image files
import os
if len(os.listdir("./data/train/")) == len(filenames):
print("Filenames match actual amount of files!")
else:
print("Filenames do not match actual amount of files, check the target directory.")
So yes that checks out, we have all the image locations

Labels

Now we’ve got our image filepaths together, let’s get the labels.

We’ll take them from labels_csv and turn them into a NumPy array.

labels = labels_csv["breed"].to_numpy() # converts directly into numpy array
# labels = np.array(labels)
labels
Output

Checking if there is any missing data: Labels or images

It is not as simple in unstructured data: so we just have to check the length of labels and images to see whether there is any missing data or not.

if len(filenames) == len(labels):
print("No missing Data")
else:
print("There is missing data !")

Convert Breed Into Numbers

Turn one label into an array of booleans:

SO there will be 1 True value and other 119 False FOR ONE LABEL out of 10222 total labels.

print(labels[0])
labels[0] == unique_breeds # use comparison operator to create boolean array
1 True value, 119 False

Turn EACH label into this boolean array

# Turning EACH label into this kinda boolean array

boolean_labels = []

for label in labels:
boolean_labels.append(label == unique_breeds)

Split into Train, Valid and Test Set:

# Setup X & y variables
X = filenames
y = boolean_labels

Since we’re working with 10,000+ images, it’s a good idea to work with a portion of them to make sure things are working before training on them all.

This is because computing with 10,000+ images could take a fairly long time. And our goal when working through machine learning projects is to reduce the time between experiments.

Let’s start experimenting with 1000 and increase it as we need.

# Set number of images to use for experimenting
NUM_IMAGES = 1000

# Import train_test_split from Scikit-Learn
from sklearn.model_selection import train_test_split

# Split them into training and validation using NUM_IMAGES
X_train, X_val, y_train, y_val = train_test_split(X[:NUM_IMAGES],
y[:NUM_IMAGES],
test_size=0.2,
random_state=42)

len(X_train), len(y_train), len(X_val), len(y_val)
Train and Validation split for 1000 images

Preprocessing Images: Turning Images to Tensors

To preprocess our images into Tensors we’re going to write a function which does a few things:

1. Takes an image filename as input.

2. Uses TensorFlow to read the file and save it to a variable, `image`.

3. Turn our `image` (a jpeg file) into Tensors.

4. Normalize our `image`

5. Resize the image to be of shape (224, 224).

6. Return the modified image.

# Convert Images to numpy arrays

from matplotlib.pyplot import imread
image = imread(filenames[42])
image.shape
  1. Turns the filepath into a Tensor of type String
tensor = tf.io.read_file(filenames[20])
tensor

2. Turn image into a Numerical Tensor which will have values 0–255 for RGB for each pixel

tensor = tf.image.decode_jpeg(tensor, channels=3)
tensor

3. Normalization

Convert these numbers that are 0–255 for each RGB into 0–1 for each RGB for optimization

tf.image.convert_image_dtype(tensor, tf.float32)

Ok, now let’s build that function we were talking about.

IMG_SIZE = 224

def preprocess_image(image_path, img_size=IMG_SIZE):
"""
Takes an image file path and turns it into a Tensor
"""

# 1. Read an image:
image = tf.io.read_file(image_path)
# 2. Turn it into Numerical Tensor using 3 channels: RGB
image = tf.image.decode_jpeg(image, channels=3)
# 3. Normalization: Convert these 0-255 into 0-1 for each RGB
image = tf.image.convert_image_dtype(image, tf.float32)
# 4. Resize to our desired size: (224,224)
image = tf.image.resize(image, size=[IMG_SIZE,IMG_SIZE])
# 5. Return
return image

Turning Data into Batches

Since working with 10,000+ data entries all at once is not feasible for the memory, we turn the data into smaller batches

Batch Size is generally of size 32.

In order to use Tensorflow effectively, we must have them in the tuple:

(image, label)

# Create a simple function to return a tuple (image, label)
def get_image_label(image_path, label):
"""
Create a simple function to return a tuple (image, label).
Takes an image file path name and the associated label,
processes the image and returns a tuple of (image, label).
"""
image = preprocess_image(image_path)
return image, label

NOW create the actual function:

# Create a function to make batches:
BATCH_SIZE = 32

def create_batches(X, y=None, batch_size=BATCH_SIZE, valid_data=False, test_data=False):
"""
Creates batches of data out of image (X) and label (y) pairs.
Shuffles the data if it's training data but doesn't shuffle it if it's validation data.
Also accepts test data as input (no labels).
"""
if test_data: # NO LABELS SINCE TEST
print("Creating Test Data Batches...")
data_whole = tf.data.Dataset.from_tensor_slices((tf.constant(X),))
data_batch = data_whole.map(preprocess_image).batch(BATCH_SIZE)
return data_batch

elif valid_data: # NO SHUFFLING REQUIRED
print("Creating Valid Data Batches...")
data_whole = tf.data.Dataset.from_tensor_slices((tf.constant(X), tf.constant(y)))
data_batch = data_whole.map(get_image_label).batch(BATCH_SIZE)
return data_batch

else: # TRAINING SET: SHUFFLING ALSO REQUIRED
print("Creating Training Data Batches...")
data_whole = tf.data.Dataset.from_tensor_slices((tf.constant(X), tf.constant(y)))
data_batch = data_whole.shuffle(buffer_size=len(X)).map(get_image_label).batch(BATCH_SIZE)
return data_batch

Creating training and validation Batches

# Creating training and validation Sets
train_data = create_batches(X_train, y_train)
val_data = create_batches(X_val , y_val, valid_data=True)

Visualising Data Batches

# Create a function for viewing images in a data batch
def show_25_images(images, labels):
"""
Displays 25 images from a data batch.
"""
# Setup the figure
plt.figure(figsize=(10, 10))
# Loop through 25 (for displaying 25 images)
for i in range(25):
# Create subplots (5 rows, 5 columns)
ax = plt.subplot(5, 5, i+1)
# Display an image
plt.imshow(images[i])
# Add the image label as the title
plt.title(unique_breeds[labels[i].argmax()])
# Turn gird lines off
plt.axis("off")

Unbatch the data to visualise it

  1. Training images:
train_images, train_labels = next(train_data.as_numpy_iterator())
show_25_images(train_images, train_labels)

2. Validation images

val_images, val_labels = next(val_data.as_numpy_iterator())
show_25_images(val_images, val_labels)

Building a model

Before building a model, we need to define a few things:

  • The input shape (Shape of images)
  • The output shape (Shape of labels)
  • The URL of the model we want to use from Tensorflow Hub

We are going to be using mobilenet_v2 avalaible on Tensorflow Hub

# Shape of our images
INPUT_SHAPE = [None, IMG_SIZE, IMG_SIZE, 3] # Batch, height, width, color channels

# Shape of our labels
OUTPUT_SHAPE = len(unique_breeds)

# Setup a model URL
MODEL_URL = "https://tfhub.dev/google/imagenet/mobilenet_v2_130_224/classification/5"

Now we are going to create a function to:

  • Takes in the input shape, output shape, and the model we’ve chosen as parameters
  • Defines layers in a Keras model in sequential fashion.
  • Compiles the model
  • Builds the model
  • Returns the model

All these steps are from Keras Website

def create_model(input_shape=INPUT_SHAPE, output_shape=OUTPUT_SHAPE, model_url=MODEL_URL):
print("Building a Model with the url: ", MODEL_URL)

# Setup the Keras layers: Instantiating the model
model = tf.keras.Sequential([
hub.KerasLayer(MODEL_URL), # LAYER1: INPUT LAYER
tf.keras.layers.Dense(units=OUTPUT_SHAPE, # LAYER2: OUTPUT LAYER
activation="softmax") # MULTICLASS CLASSIFICATION: SOFTMAX, binary classification: sigmold
])

# Compiling the model
model.compile(
loss=tf.keras.losses.CategoricalCrossentropy(),
optimizer=tf.keras.optimizers.Adam(),
metrics=["accuracy"]
)

# Build the model
model.build(INPUT_SHAPE)

return model


model = create_model()
model.summary()

Creating Callbacks:

Callbacks are helper functions a model can use during training to do things such as save a models progress, check a models progress or stop training early if a model stops improving.

The two callbacks we’re going to add are a TensorBoard callback and an Early Stopping callback.

1. Tensorboard Callback

# Load Tensorboard NB extension:
%load_ext tensorboard

Create the callback function:

import datetime

# Create a function to build a TensorBoard callback
def create_tensorboard_callback():
# Create a log directory for storing TensorBoard logs
logdir = os.path.join("./logs",
# Make it so the logs get tracked whenever we run an experiment
datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
return tf.keras.callbacks.TensorBoard(logdir)

2. Early Stopping Callback

Early stopping helps prevent overfitting by stopping a model when a certain evaluation metric stops improving.. If a model trains for too long, it can do so well at finding patterns in a certain dataset that it’s not able to use those patterns on another dataset it hasn’t seen before (doesn’t generalize).

It’s basically like saying to our model, “keep finding patterns until the quality of those patterns starts to go down.”

Training on a Subset Only:

Training on a subset of the data say 1000 images is much better than everytime training on the entire dataset. It saves a lot of time.

NUM_EPOCHS = 15
# Number of chances our model have to improve

Let’s create a function to train a model. It will:

  • Create a model using `create_model()`
  • Setup a TensorBoard callback using `create_tensorboard_callback()` (we do this here so it creates a log directory of the current date and time).
  • Call the `fit()` function on our model passing it the training data, validatation data, number of epochs to train for and the callbacks we’d like to use.
  • Return the fitted model.
# Build a function to train and return a trained model

def train_model():
"""
Trains a given model and returns the trained version of it.
"""

# Create Model:
model = create_model()

# Create new TensorBoard session everytime we train a model
tensorboard = create_tensorboard_callback()

# Fit the model
model.fit(x=train_data,
epochs=NUM_EPOCHS,
validation_data=val_data,
validation_freq=1, # Test the patterns found on validation set EVERYTIME isliye 1
callbacks=[tensorboard, early_stopping])

# Return the fitted model:
return model


model = train_model()

OverFitting

We can see that our model is performing way better on training set: 100% accuracy but not so high on validation set: 63%

Checking the tensorboard LOGS:

%tensorboard will access the logs directory and visualise its content

%tensorboard --logdir=./logs

Make predictions on the validation data

This is the data (also test) that our model has not seen.

predictions = model.predict(val_data, verbose=1)
predictions.shape
200 images, 120 unique breeds of dogs
def print_predict(predictions):
for i in range(len(predictions)):
print("\n\n",i)
# the max probability value predicted by the model
print(f"Max value (probability of prediction): {np.max(predictions[i])}")
# because we used softmax activation in our model, this will be close to 1
print(f"Sum: {np.sum(predictions[i])}")
# the index of where the max value in predictions[0] occurs
print(f"Max index: {np.argmax(predictions[i])}")
# the predicted label
print(f"Predicted label: {unique_breeds[np.argmax(predictions[i])]}")


print_predict(predictions=predictions)

Create a function to get the predicted label:

def get_pred_label(prediction_probabilities):
"""
Turns an array of prediction probabilities into label
"""

return unique_breeds[np.argmax(prediction_probabilities)]

pred_label = get_pred_label(predictions[0])
pred_label
val_data

<BatchDataset element_spec=(TensorSpec(shape=(None, 224, 224, 3), dtype=tf.float32, name=None), TensorSpec(shape=(None, 120), dtype=tf.bool, name=None))>

from the output we can see that it has batched 2 tensor things: 224 224 3 : image, and unique_breeds

Create a function to unbatch a batched dataset

Now since our val_data is in the form of batch, now, since we have our predictions with us, our truth labels are inside this batch,

SO we need to unbatchify it in order to check how well our model actually did

# Create a function to unbatch a batched dataset
def unbatchify(data):
"""
Takes a batched dataset of (image, label) Tensors and returns separate arrays
of images and labels.
"""
images = []
labels = []
# Loop through unbatched data
for image, label in data.unbatch().as_numpy_iterator():
images.append(image)
labels.append(unique_breeds[np.argmax(label)])
return images, labels


# Use the function:
val_images, val_labels = unbatchify(val_data)
val_images[0], val_labels[0]

Visualization

Now we have:

  • Prediction Labels
  • Validation Labels: Truth Labels
  • Validation Images

More specifically, we want to be able to view an image, its predicted label and its actual label (true label).

  • Take an array of prediction probabilities, an array of truth labels, an array of images and an integer.
  • Convert the prediction probabilities to a predicted label.
  • Plot the predicted label, its predicted probability, the truth label and target image on a single plot.
def plot_pred(prediction_probabilities, labels, images, n=1):
"""
View the prediction, ground truth label and image for sample n.
"""
pred_prob, true_label, image = prediction_probabilities[n], labels[n], images[n]

# Get the pred label
pred_label = get_pred_label(pred_prob)

# Plot image & remove ticks
plt.imshow(image)
plt.xticks([])
plt.yticks([])

# Change the color of the title depending on if the prediction is right or wrong
if pred_label == true_label:
color = "green"
else:
color = "red"

plt.title("{} {:2.0f}% ({})".format(pred_label,
np.max(pred_prob)*100,
true_label),color=color)


# View an example prediction, original image and truth label
plot_pred(prediction_probabilities=predictions,
labels=val_labels,
images=val_images)
def plot_pred_conf(prediction_probabilities, labels, n=1):
"""
Plots the top 10 highest prediction confidences along with
the truth label for sample n.
"""
pred_prob, true_label = prediction_probabilities[n], labels[n]

# Get the predicted label
pred_label = get_pred_label(pred_prob)

# Find the top 10 prediction confidence indexes
top_10_pred_indexes = pred_prob.argsort()[-10:][::-1]
# Find the top 10 prediction confidence values
top_10_pred_values = pred_prob[top_10_pred_indexes]
# Find the top 10 prediction labels
top_10_pred_labels = unique_breeds[top_10_pred_indexes]

# Setup plot
top_plot = plt.bar(np.arange(len(top_10_pred_labels)),top_10_pred_values,color="grey")
plt.xticks(np.arange(len(top_10_pred_labels)),labels=top_10_pred_labels,rotation="vertical")

# Change color of true label
if np.isin(true_label, top_10_pred_labels):
top_plot[np.argmax(top_10_pred_labels == true_label)].set_color("green")
else:
pass


plot_pred_conf(prediction_probabilities=predictions,
labels=val_labels,
n=1)

Let’s check a few predictions and their different values

i_multiplier = 0
num_rows = 3
num_cols = 2
num_images = num_rows*num_cols
plt.figure(figsize=(5*2*num_cols, 5*num_rows))
for i in range(num_images):
plt.subplot(num_rows, 2*num_cols, 2*i+1)
plot_pred(prediction_probabilities=predictions,
labels=val_labels,
images=val_images,
n=i+i_multiplier)
plt.subplot(num_rows, 2*num_cols, 2*i+2)
plot_pred_conf(prediction_probabilities=predictions,
labels=val_labels,
n=i+i_multiplier)
plt.tight_layout(h_pad=1.0)
plt.show()

Saving and Loading the Model

def save_model(model, suffix=None):
"""
Saves a given model in a models directory and appends a suffix (str)
for clarity and reuse.
"""
# Create model directory with current time
modeldir = os.path.join("./models",
datetime.datetime.now().strftime("%Y%m%d-%H%M%S"))
model_path = modeldir + "-" + suffix + ".h5" # save format of model
print(f"Saving model to: {model_path}...")
model.save(model_path)
return model_path

def load_model(model_path):
"""
Loads a saved model from a specified path.
"""
print(f"Loading saved model from: {model_path}")
model = tf.keras.models.load_model(model_path,
custom_objects={"KerasLayer": hub.KerasLayer})
return model

save_model(model, suffix="1000-images-Adam")

Training the model on the full dataset

# Turn full training data in a data batch
full_data = create_batches(X, y)

# Instantiate a new model for training on the full dataset
full_model = create_model()

# Create full model callbacks

# TensorBoard callback
full_model_tensorboard = create_tensorboard_callback()

# Early stopping callback
# Note: No validation set when training on all the data, therefore can't monitor validation accuracy
full_model_early_stopping = tf.keras.callbacks.EarlyStopping(monitor="accuracy",
patience=3)

%tensorboard --logdir=./logs

Note Since running the cell below will cause the model to train on all of the data (10,000+) images, it may take a fairly long time to get started and finish. However, thanks to our `full_model_early_stopping` callback, it’ll stop before it starts going too long.

Remember, the first epoch is always the longest as data gets loaded into memory. After it’s there, it’ll speed up.

# Fit the full model to the full training data
full_model.fit(x=full_data,
epochs=NUM_EPOCHS,
callbacks=[full_model_tensorboard,
full_model_early_stopping])
# Save model to file
save_model(full_model, suffix="all-images-Adam")

# Load in the full model
loaded_full_model = load_model('./models/20230629-162637-all-images-Adam.h5')

Make predictions on the Test Data Set

# Load test image filenames (since we're using os.listdir(), these already have .jpg)
test_path = "./data/test/"
test_filenames = [test_path + fname for fname in os.listdir(test_path)]

# Create test data batch
test_data = create_batches(test_filenames, test_data=True)

# Make predictions on test data batch using the loaded full model
test_predictions = loaded_full_model.predict(test_data,
verbose=1)

Preparing for submission on Kaggle

Instructions given on Kaggle itself

# Create pandas DataFrame with empty columns
preds_df = pd.DataFrame(columns=["id"] + list(unique_breeds))
preds_df.head()

# Append test image ID's to predictions DataFrame
preds_df["id"] = [os.path.splitext(path)[0] for path in os.listdir(test_path)]

# Add the prediction probabilities to each dog breed column
preds_df[list(unique_breeds)] = test_predictions

preds_df.to_csv("./full_submission_1_mobilienetV2_adam.csv",
index=False)

Now to evaluate our model we will submit this.

Congratulations!

You now have a CNN that can classify new images of dog breeds! You can use this model to build a dog breed classifier app, Identify dogs in shelter photos to aid in adoption, and more. The possibilities are endless!

Conclusion

So there you have it, a deep dive into building an image classifier for dog breeds using CNNs. You now know how to assemble a dataset, build a CNN model, train it, and evaluate its performance. Pretty cool that with nueral networks you were able to create a model to classify 120 breeds with over 90% accuracy. Now you’ve got a new machine learning skill under your and a project you can build on. Maybe next you’ll collect even more data and images to improve the model further or apply what you’ve learned to another multi-class image classification challenge. The possibilities are endless. What will you build next ?

--

--