What About a 6-Week Machine Learning Project? Beginners Friendly Cat vs Dog Classification Problem.(Week 6)

Rohith Vazhathody
Analytics Vidhya
Published in
6 min readSep 2, 2020

--

Prediction

It’s been 6 long weeks since I started working on my Cat Vs Dog Classification Project. I was able to learn a lot of new things, in the course of this project’s completion. It made me go through the documentation for several times, ask many questions, search for the answers and even rerun the code that put me in doubt, during various stages.

All of this began from searching for the data set which I got from Kaggle.com. My next step was uploading it into google drive and mounting the data on google colab. After all of it was done, I was ready to code and work with the data set.

All the given operations were performed:

  1. Finding the image labels.
  2. Creation of a validation set.
  3. Turning data into batches.
  4. Selecting and building a model.
  5. Creating Callbacks.
  6. Training our model on the subsets of data.
  7. Predicting and evaluating the model.
  8. Training the model on the full data set.
  9. Saving and Loading the model. ( Link to Previous Article ).

Once all these operations were catered to, I was left with the task of making predictions on the test data set and making custom image prediction.

Predictions on Test Data

After training the model on the full data set and loading the saved model, we were ready to test the model with the test data.

In order to make predictions on the test data set, we needed a few things:

  • Get the test image filenames.
  • Convert the filenames into data batches using create_data_batches() and also set the parameter into true since we are using test data set and there won’t be any labels.
  • Make a predictions array by passing the test batches to predict() method on our model.
  • Get the test image filenames

We can directly access the file path by just copying the path of the file and then adding it into a list. This had to be done for both cat and dog..

Cat

# Loading the test cat image filenames
test_cat_path = 'drive/My Drive/CatVsDog/test_set/test_set/cats/'
test_cat_filenames = [test_cat_path + fname for fname in os.listdir(test_cat_path)]
test_cat_filenames[:5]
['drive/My Drive/CatVsDog/test_set/test_set/cats/cat.4585.jpg',
'drive/My Drive/CatVsDog/test_set/test_set/cats/cat.4592.jpg',
'drive/My Drive/CatVsDog/test_set/test_set/cats/cat.4632.jpg',
'drive/My Drive/CatVsDog/test_set/test_set/cats/cat.4580.jpg',
'drive/My Drive/CatVsDog/test_set/test_set/cats/cat.4575.jpg']

Dog

# Loading the test dog image filenames
test_dog_path = 'drive/My Drive/CatVsDog/test_set/test_set/dogs/'
test_dog_filenames = [test_dog_path + fname for fname in os.listdir(test_dog_path)]
test_dog_filenames[:5]
['drive/My Drive/CatVsDog/test_set/test_set/dogs/dog.4651.jpg',
'drive/My Drive/CatVsDog/test_set/test_set/dogs/dog.4158.jpg',
'drive/My Drive/CatVsDog/test_set/test_set/dogs/dog.4426.jpg',
'drive/My Drive/CatVsDog/test_set/test_set/dogs/dog.4462.jpg',
'drive/My Drive/CatVsDog/test_set/test_set/dogs/dog.4434.jpg']
  • Making use of create_data_batches()

Here, we converted both cat data and dog data into data batches and set the parameter test_data = True as we are actually dealing with test data and there are no labels associated with these data.

Cat

# Create test cat data batches
test_cat_data = create_data_batches(test_cat_filenames, test_data=True)
creating test data batches.....

Dog

# Create test dog data batches
test_dog_data = create_data_batches(test_dog_filenames, test_data=True)
creating test data batches.....
  • Make predictions on the test data

Now we are all ready to make use of the predict() method to make predictions on the test data of cat and dog. After making predictions, we save all of the predicted values in a .csv file which can be used later.

Cat

# Make predictions on test cat data batch using the loaded full model
test_cat_predictions = loaded_full_cat_model.predict(test_cat_data, verbose=1)
32/32 [==============================] - 21s 671ms/stepnp.savetxt("drive/My Drive/CatVsDog/pred_cat_array.csv", test_cat_predictions, delimiter=",")

Dog

# Make predictions on test dog data batch using the loaded full model
test_dog_predictions = loaded_full_dog_model.predict(test_dog_data, verbose=1)
32/32 [==============================] - 909s 28s/stepnp.savetxt("drive/My Drive/CatVsDog/pred_dog_array.csv", test_dog_predictions, delimiter=",")

Now we have the prediction array for both cat and dog with all the prediction values. Next thing left in this project was to actually test our model on custom images and see how good our model was?

Custom Image Prediction

For making the prediction on some random images, we can download the image from the internet or just individually take a photo of a dog or cat nearby and upload it into google drive.

After uploading images into google drive, we need to do the following-as we had done earlier with the test data:

  • Getting the file paths

As discussed earlier, it can be easily obtained by just copying the path of the image.

# Get the custom path
import os
custom_path = "drive/My Drive/CatVsDog/CustomCatAndDogImage/"
custom_image_path = [custom_path + fname for fname in os.listdir(custom_path)]
custom_image_path
custom_image_path_length = len(custom_image_path)

Here I saved the length of the list as it will be needed later when we plot the custom images with prediction probability and label.

  • Create data batches

Here our handy function create_data_batch is used and set the test_data parameter into True.

# Turn the file paths into data batches.
custom_data = create_data_batches(custom_image_path, test_data=True)
custom_data
creating test data batches.....<BatchDataset shapes: (None, 224, 224, 3), types: tf.float32>
  • Make Predictions on both cat and dog model

Here all the custom test images are predicted on both the cat and dog model as our model need to actually understand which is cat and dog without actually feeding only cat images into cat model and dog images into dog model. So all the images get predicted using both models.

custom_prediction_1 = loaded_full_cat_model.predict(custom_data)
custom_prediction_2 = loaded_full_dog_model.predict(custom_data)
  • Plot the image, label, prediction probability

For the first time, while plotting the image using labels and probability, I got tricked as I only got cat label even for a dog image with lesser probability that it will be a cat. So I printed out the maximum probability and figured out how I could correctly show the image. I just used an if() statement that if an image has a higher prediction probability on cat model, then it will be cat and otherwise we can say it will be dog .

So while running this code, I got the labels correctly as expected with more than 90 % of prediction probabilities for each image.

Then I tried to make prediction on another image with both cat and dog in that single image. This time I only got either cat or dog based on the comparison I used before. So I again figured out what was the probability that the image was showing and observed that both of the models predict images with prediction probability greater than 75%.

So I again used another if() statement that if both the models predict the images with a probability greater than 75 %, then in that particular image there will be both cat and dog and we can easily print the label cat and dog with their respective prediction probabilities.

custom_images = []
for image in custom_data.unbatch().as_numpy_iterator():
custom_images.append(image)
import matplotlib.pyplot as plt
plt.figure(figsize=(20, 20))
for i, image in enumerate(custom_images):
plt.subplot(1, custom_image_path_length, i+1)
plt.xticks([])
plt.yticks([])
if np.max(custom_prediction_1[i]) > 0.75 and np.max(custom_prediction_2[i]) > 0.75:
plt.title("{} {:2.0f}% {:2.0f}% {}".format("cat",
np.max(custom_prediction_1[i])*100,
np.max(custom_prediction_2[i])*100,
"dog"), color="green")
elif np.max(custom_prediction_1[i]) > np.max(custom_prediction_2[i]):
plt.title("{} {:2.0f}% {}".format("cat",
np.max(custom_prediction_1[i])*100,
"cat"), color="green")
else:
plt.title("{} {:2.0f}% {}".format("dog",
np.max(custom_prediction_2[i])*100,
"dog"), color="green")
plt.imshow(image)
Prediction result.

My Github Repo: Link.

Here , 9 out of 10 images were predicted correctly and only one went wrong as there was both cat and dog in it, and the cat was somewhat hidden within the hands of the dog-so it only predicted dog. So this can be taken as a follow up for this project and we can further train the model with more such images.

Now, I have completed all the work that I had planned to achieve and am very much happy that I was able to complete the whole project in time. I like to thank my Machine Learning Instructor from Udemy Daniel Bourke, for helping me out with such an idea which not only enhanced my knowlede, but also gave me the confidence that I could get this job done. Now, I will be looking for some other type of project to do, since I can learn new things from it likewise and there is no pause button to learning new things!

--

--

Rohith Vazhathody
Analytics Vidhya

Software Engineer | Weekly articles | Interested in DSA, design | Writes about problem solving, algorithm explanation of my understanding and Java Codes.