Rock Paper Scissor Image Classifier using Torch Vision and a CNN

Xijian Lim
The Startup
Published in
4 min readJul 17, 2019

I’ve been exploring the using of Pytorch’s frameworks over the past year and noticed that alot of the answers to questions have to be unearthed in the nooks and crannies of forums etc.

I’m writing this post to hopefully help other poor souls who have a hard time with their ETL and testing trained models.

Too often, article and code being shared simply deals with the training steps where we get the ubiquitous “my models prediction score is 98%!” ending. Thats great and all but everyone that is getting into ML and Deep Learning all have questions like:

  1. How do we test a single sample to see if it REALLY works?
  2. Can we see the actual prediction?
  3. How can we load our own datasets ?

Because like every layman, the proof REALLY is in the pudding and seeing is believing.

So hopefully this article helps people understand the practical side of things, and to simplify torch’s neat packages that suffers from pretty confusing documentation.

The first thing we’re going to need it to get our training datasets. Simply download and unzip the files from:

and unzip them into a folder. For simplicity, just call the folder data. You should have 3 folders stuffed with images.

Now with torch, the basic idea is to use their helpful data loaders. Luckily, torchvision has a great data loader that will: 1) take all folders within a directory, 2) transform your image files in each subfolder and 3) label (in tensor integers) everything for you as per folder the image is in. So to summarise: the premise of the loader is that your files should be in their respective folders are saved.

Tip: resize your images! The funny thing is that the “transforms” package has a pretty unintuitive list of functions. “Resize” in this case is “image height” and “CenterCrop is “image width”.

For simplicity, i’ve kept the train and test sets the same, otherwise i’d have to split the RPS images into two folders, or find a function to shuffle and split them.

Also, i’ve kept the resized the images to a 28 X 28 format. You can see how an image is presented after being converted into a tensor:

Pretty cool.

The “shape” of the tensor regarding the image is explained this way:

This is a layman’s explanation of the actual data you’ll be handling: each image is now represented by an array of numbers. There are 3 “dimensions” for each image, each consisting of 28 by 28 pixels; the 3 dimensions are the color codes for “red,green, blue”.

If you’d like to covert the images to color, simply uncomment “transforms.Grayscale(num_output_channels=3),”

Now the next part hopefully helps users calculate their components of the the CNN easier:

### calculating padding
height=28
width = 28
kernel_size = 5 # we
padding = int((kernel_size-1)/2)
stride=1

#output_after_1_convolution
out1=(height-kernel_size+(2*padding))/1 + 1 #=28

output_pool1 = out1/padding #=14

out2=(output_pool1-kernel_size+(2*padding))/1 + 1 #= 14

output_pool2 = out2/padding #=7

To help break it down, think of our model as having a cybernetic eye. it looks at 5 pixels at a time going from left to right by a “stride” of 1. “Padding” is to help the eye fill in when it comes close to the border edges of the image.

From there, the inputs are “pooled” to form a quick aggregate of the values (either average or max)

For a better understanding, see:

Now for the fun bit, actual testing!

The biggest tip I can give to others is that torch is built on the premise of data loaders loading batches and batches of data. If you want test a single sample, the best way to do it is to use the “Variable” function around, otherwise you have to test a “batch” from the test_loader.

Hooray!

So download the code, datasets and mess around.

Have a try at taking a picture of your own and testing it after performing the necessary reshaping.

For the full code, see:

Shout out to:

for the ideas and support.

--

--

Xijian Lim
The Startup

Stumbling through Data Science after changing my career paths. Attempting to automate everything.