Is It a Cat or a Dog? — Python Machine Learning Tutorial For Image Recognition

Shelly Leal
WWCode Python
Published in
6 min readAug 26, 2022
Image by Somyot Sutprattanatawin

Machine learning is an incredible tool, and its technique can be used to develop and evaluate advanced neural networks for image classification. But how to start? Here is a fun example you can do by teaching a computer to identify the difference between photos of cats and dogs.

First, let’s understand very quickly how Convolutional Neural Network (CNN) works. Neural networks are a group of connections made to imitate neurons in the human brain — instead of cells, these are called nodes. Each of these nodes is interconnected and separated by a group of layers. These connections are assigned to weights — as you can see below — which multiply when receiving data from different nodes.

Source: Convolutional Neural Networks for Beginners

Now, when we are talking about images, neural networks are defined by each of the pixels. The larger the image, the more quantity of nodes is necessary. For an RGB image with 3 colors, let’s say, an image of 64x64 pixels, we would need: 64x64x3 or 12288 weights. Working with multiple bigger images requires a level of computer power that a typical neural network cannot handle.

This is where the convolutional neural network comes in. What if we can filter large image inputs to a smaller map of data so that we can process all of this information with the resources that we have?

By using the mathematical operation convolution, this is possible, and the images are reduced in a way that valuable features are not lost during the process. You can read more details about the theory here.

Now let’s go to the fun practice. This entire development was based on this nice tutorial from Jason Brownlee on Machine Learning Mastery. Some changes were made to make sure the code was adapted to the local Kaggle environment, and also updated some functions that are now deprecated.

First, grab the dataset from Kaggle called Dogs vs. Cats, through Data > Download All. I recommend checking first the documentation on how to get familiar with Kaggle Notebook if you are new to this environment. If you are already used to programming in Python with other tools, such as Google Colab and Jupyter Notebook, this will be very intuitive.

Import Libraries and Preview the Photos

from matplotlib import pyplot
from matplotlib.image import imread
# location of dataset in your diretory
folder = '../input/d/biaiscience/dogs-vs-cats/train/train/'
# plot images
for i in range(3):
#subplot
pyplot.subplot(330 + 1 + i)
#filename - either is .cat or .dog
filename = folder + 'dog.' + str(i) + '.jpg'
# image pixels
image = imread(filename)
#raw pixel data
pyplot.imshow(image)
# show the figure
pyplot.show()

To preview the cats’ images, just repeat the process, only this time replace the filename variable to:

filename = folder + ‘cat.’ + str(i) + ‘.jpg'

You will see the results below (so adorable!):

Results for image preview of the <train> directory

Break the Dataset into Subdirectories

The first part is to organize the train and test directories for your project. We can set a ratio of 25% of images that will be used later for validation of the model. Here is how it works:

# organize dataset into a useful structure
from os import makedirs
from os import listdir
from shutil import copyfile
from random import seed
from random import random
# create directories
dataset_home = 'dataset_dogs_vs_cats/'
subdirs = ['train/', 'test/']
for subdir in subdirs:
# create label subdirectories
labeldirs = ['dogs/', 'cats/']
for labldir in labeldirs:
newdir = dataset_home + subdir + labldir
makedirs(newdir, exist_ok=True)
# seed random number generator
seed(1)
# define ratio of pictures to use for validation
val_ratio = 0.25
# copy training dataset images into subdirectories
src_directory = 'train/'
for file in listdir(src_directory):
src = src_directory + '/' + file
dst_dir = 'train/'
if random() < val_ratio:
dst_dir = 'test/'
if file.startswith('cat'):
dst = dataset_home + dst_dir + 'cats/' + file
copyfile(src, dst)
elif file.startswith('dog'):
dst = dataset_home + dst_dir + 'dogs/' + file
copyfile(src, dst)

Define the Model

Now we develop a CNN Model to work with our Dogs vs Cats dataset. The idea is to define a layer with one VGG (Visual Geometry Group) block, with a depth of 32 with 3x3 filters, a max pooling layer, and using the ReLU activation function. For this example, we are using only one layer, but it is possible to add more and more VGG blocks, for instance, with different depths (eg.: 64, 128), right after the first block described below.

from tensorflow.keras.optimizers import SGD
from keras.models import Sequential
from keras.layers import Conv2D
from keras.layers import MaxPooling2D
from keras.layers import Flatten
from keras.layers import Dense
from keras.layers import Dropout
# define cnn model
def define_model():
model = Sequential()
model = Sequential()
# block 1
model.add(Conv2D(32, (3, 3), activation='relu', kernel_initializer='he_uniform', padding='same', input_shape=(200, 200, 3)))
model.add(MaxPooling2D((2, 2)))
model.add(Flatten())
model.add(Dense(128, activation='relu', kernel_initializer='he_uniform'))
model.add(Dense(1, activation='sigmoid'))
# compile model
opt = SGD(learning_rate=0.001, momentum=0.9)
model.compile(optimizer=opt, loss='binary_crossentropy', metrics=['accuracy'])
return model
model = define_model()

Scale Images and Prepare Iterators

The data needs to be prepared and scaled before you start training the model. We do so by setting the pixel to a range of 0 to 1, using <rescale = 1.0/255.0> in the ImageDataGenerator function from Keras. After that, we use the function flow_from_directory() to define the size targeted as 200x200 pixels for the images in the train and test directories.

# create data generator
from keras.preprocessing.image import ImageDataGenerator
datagen = ImageDataGenerator(rescale=1.0/255.0)
# prepare iterators
train_it = datagen.flow_from_directory('dataset_dogs_vs_cats/train/',
class_mode='binary', batch_size=64, target_size=(200, 200))
test_it = datagen.flow_from_directory('dataset_dogs_vs_cats/test/',
class_mode='binary', batch_size=64, target_size=(200, 200))

You will get the following output:

> Found 18697 images belonging to 2 classes.
> Found 6303 images belonging to 2 classes.

Fit and Evaluate Model

After setting the iterators with the images from the train and test input directories, the model can be fit. Below, I used a total of 20 epochs, which is the number of times the model is learning. It can be increased to guarantee accuracy; however, it is important to make sure that the model works while testing with images that were not used during the training step.

# fit model
history = model.fit(train_it, steps_per_epoch=len(train_it),
validation_data=test_it, validation_steps=len(test_it), epochs=20, verbose=0)

After that, you can check the accuracy percentage of the model:

# evaluate model
_, acc = model.evaluate(test_it, steps=len(test_it), verbose=0)
print('> %.3f' % (acc * 100.0),'%')

If you use just one layer as the one described above, you will get the following output:

> 72.553%

Test Prediction with Sample Test Images

Now we can check two examples of the model prediction result based on the sample images from the test directory.

File dog.2163.jpg for the first test
from keras.preprocessing.image import load_img
from keras.preprocessing.image import img_to_array
# load and prepare the image
img = load_img('dataset_dogs_vs_cats/test/dogs/dog.2163.jpg', target_size=(200, 200))
img = img_to_array(img)
img = img.reshape(1, 200, 200, 3)
img = img.astype('float32')
img = img - [123.68, 116.779, 103.939]
result = model.predict(img)
result

You get the following output:

>> array([[1.]], dtype=float32)

This is correct since “1” means the model predicted that the photo is from a dog.

Next, let’s repeat the test but with a cat image:

File cat.6428.jpg for the second test
img = load_img('dataset_dogs_vs_cats/test/cats/cat.6428.jpg', target_size=(200, 200))
img = img_to_array(img)
img = img.reshape(1, 200, 200, 3)
img = img.astype('float32')
img = img - [123.68, 116.779, 103.939]
result = model.predict(img)
result

You will get the following output:

>> array([[0.]], dtype=float32)

This is correct since “0” means the model predicted that the photo is from a cat.

I hope you enjoyed this tutorial. Feel free to explore my other articles about a tech career, or contact me through Linkedin.

Nice coding to you all!

--

--

Shelly Leal
WWCode Python

Data scientist, 20 + 6 years old, my passion is technology and career development. https://www.linkedin.com/in/shelly-leal/