Rock Paper Scissor Image Classifier using Torch Vision and a CNN

Published in

The Startup

4 min readJul 17, 2019

I’ve been exploring the using of Pytorch’s frameworks over the past year and noticed that alot of the answers to questions have to be unearthed in the nooks and crannies of forums etc.

I’m writing this post to hopefully help other poor souls who have a hard time with their ETL and testing trained models.

Too often, article and code being shared simply deals with the training steps where we get the ubiquitous “my models prediction score is 98%!” ending. Thats great and all but everyone that is getting into ML and Deep Learning all have questions like:

How do we test a single sample to see if it REALLY works?
Can we see the actual prediction?
How can we load our own datasets ?

Because like every layman, the proof REALLY is in the pudding and seeing is believing.

So hopefully this article helps people understand the practical side of things, and to simplify torch’s neat packages that suffers from pretty confusing documentation.

The first thing we’re going to need it to get our training datasets. Simply download and unzip the files from:

Rock-Paper-Scissors Images

Images from the Rock-Paper-Scissors game

www.kaggle.com

and unzip them into a folder. For simplicity, just call the folder data. You should have 3 folders stuffed with images.

Now with torch, the basic idea is to use their helpful data loaders. Luckily, torchvision has a great data loader that will: 1) take all folders within a directory, 2) transform your image files in each subfolder and 3) label (in tensor integers) everything for you as per folder the image is in. So to summarise: the premise of the loader is that your files should be in their respective folders are saved.

Tip: resize your images! The funny thing is that the “transforms” package has a pretty unintuitive list of functions. “Resize” in this case is “image height” and “CenterCrop” is “image width”.

For simplicity, i’ve kept the train and test sets the same, otherwise i’d have to split the RPS images into two folders, or find a function to shuffle and split them.

Also, i’ve kept the resized the images to a 28 X 28 format. You can see how an image is presented after being converted into a tensor:

Pretty cool.

The “shape” of the tensor regarding the image is explained this way:

This is a layman’s explanation of the actual data you’ll be handling: each image is now represented by an array of numbers. There are 3 “dimensions” for each image, each consisting of 28 by 28 pixels; the 3 dimensions are the color codes for “red,green, blue”.

If you’d like to covert the images to color, simply uncomment “transforms.Grayscale(num_output_channels=3),”

Now the next part hopefully helps users calculate their components of the the CNN easier:

### calculating padding
height=28
width = 28
kernel_size = 5 # we
padding = int((kernel_size-1)/2)
stride=1

#output_after_1_convolution
out1=(height-kernel_size+(2*padding))/1 + 1 #=28

output_pool1 = out1/padding #=14

out2=(output_pool1-kernel_size+(2*padding))/1 + 1 #= 14

output_pool2 = out2/padding #=7

To help break it down, think of our model as having a cybernetic eye. it looks at 5 pixels at a time going from left to right by a “stride” of 1. “Padding” is to help the eye fill in when it comes close to the border edges of the image.

From there, the inputs are “pooled” to form a quick aggregate of the values (either average or max)

For a better understanding, see:

What is Padding in Convolutional Neural Network’s(CNN’s) padding

(Multi-Class image classification step by step guide part 4)

medium.com

A Beginner's Guide To Understanding Convolutional Neural Networks Part 2

Link to Part 1 In this post, we'll go into a lot more of the specifics of ConvNets. Disclaimer: Now, I do realize that…

adeshpande3.github.io

Now for the fun bit, actual testing!

The biggest tip I can give to others is that torch is built on the premise of data loaders loading batches and batches of data. If you want test a single sample, the best way to do it is to use the “Variable” function around, otherwise you have to test a “batch” from the test_loader.

Hooray!

So download the code, datasets and mess around.

Have a try at taking a picture of your own and testing it after performing the necessary reshaping.

For the full code, see:

xijianlim/torch_and_keras

Contribute to xijianlim/torch_and_keras development by creating an account on GitHub.

github.com

Shout out to:

justinberry - Overview

justinberry has 24 repositories available. Follow their code on GitHub.