Teaching a computer the difference between a tiger and a bicycle using neural networks

Aldfd
The Startup
Published in
8 min readJun 5, 2020

--

Prelude

It’s a foggy morning, you’ve forgotten your glasses inside but theres no time to go back and get them. You head down to where you’re bike is locked, not noticing that some hooligan has surreptitiously replaced it with a tiger. After a quick trip to the hospital, you determine never to confuse a bike for a tiger ever again. Luckily for you, with a little tensorflow and a little PIL, you can teach your computer to tell the difference between bikes and tigers( or lions, sharks, really anything those hooligans might try and slip by).

The technique we’ll be leveraging to accomplish this is neural networks. We’ll be scraping data from google images, specifically pictures of bikes and tiger, doing some processing on them with PIL, and using them to train a tensorflow neural network.

Background

A neural network, as its name might suggest, is a technique for making computers learn from data, modeled on how we think the brain might learn from data. The classical use case for a neural network is teaching a computer how to recognize hand drawn digits. Though it might seem blindingly obvious to us, it’s not at all clear from the outset how we might teach a computer to recognize some pattern as a 3 and some other as a 4. For a proper explanation for the mathematical intuition I’d recommend 3Blue1Browns great 4 part series on the topic (https://www.youtube.com/playlist?list=PLZHQObOWTQDNU6R1_67000Dx_ZCJB-3p). Like most machine learning techniques, neural networks use lots of training data to try and “learn”. In the classical example we might feed a computer lots and lots of hand photos drawn digits. In our case we need to find lots of pictures of bikes and tigers and give to our model, to work its magic.

Web scraping

First things first we need some data. We’ll be using a program to automatically pull and download around 170 pictures of bikes and tigers from google images. Code for this extraction is courtesy of

from selenium import webdriver
import requests
import os
import io
import hashlib
from bs4 import BeautifulSoup
from PIL import Image
import time


###Image scraping with python code goes here####

search_and_download(search_term="tiger",driver_path=DRIVER_PATH,number_images=170)
#gets 170 images of tigers from google images, saves to folder called tiger, in folder called image, in our working directory
search_and_download(search_term="bike",driver_path=DRIVER_PATH,number_images=170)
#Same for bikes

To get our data we input a search term and a number of images into the download function. After it runs we should have a folder named tiger of 170 tiger jpg’s and bike folder of 170 bike jpgs, randomly named and formatted. Note, how many values we can scrape using this algorithm before it gets stuck depends on search term. For these terms we were able to extract around 170. Playing with the “sleep between interactions’ argument in the reference code can help increase this number. Google doesn’t like when we extract jpegs to quickly, so deliberately increasing our processing time can help us get more data without throwing up any alerts. Now that we have raw image data our next step is to do some processing on these images with the PIL module, to get them into a usable form.

PIL image processing

Our objective here is to convert our images into usable bundles of data. For each image, our neural network model wants as input a feature array and a label array. The feature array is essentially an array of numbers corresponding to each pixel of our data. For color data this means 3 numbers corresponding the RGB value, for each pixel. For grayscale, as we’ll be using, it only takes 1 number, the brightness.Label array is just a 1 dimensional array with each images label, a numerical level which corresponds to that images category. In our case, tiger images get levels 1 and bike images level 0. Before we can get these arrays we have to convert all our images to consistent processable form. We start by making them all black and white. To convert a jpg image to grayscale we run:

import PIL
img= Image.open(jpg)
img = img.convert('L')

Then, we convert all images to some common shape and resolution, since the model expects the same amount of data, ie the same amount of pixels, to come from each image. Here we can also adjust from much data our computer can handle. A 1000x1000 image contains 1 million pixels, thats 1 million pieces of data for our model to transform and process per 1 image. We adjust the resolution to match our computational capacity, playing around with the numbers is generally the easiest approach here. We can use PIL’s “crop” and “size” and “resize” functions to get all this done, as follows. First, to crop our image into a square.

size = img.size
dim = min(size)
img = img.crop((0, 0, dim, dim))

This returns a square cropped image, of dimension dimxdim where dim is just the minimum of the images height and width. Now to adjust resolution

img=img.resize((130, 130))

This returns our image in 130x130 resolution.

Finally to turn our img into an array, we just run :

img=np.asarray(img)

Returning a 130 by 130 array of grayscale values corresponding to each pixel. We combine this all together into one “ image_processor” function:

def image_processor(jpg):
img= Image.open(jpg)
img = img.convert('L'). #converts to grayscale
size = img.size
dim = min(size)
img = img.crop((0, 0, dim, dim)) #crops to square
img=img.resize((130, 130)) #changes res to 130x130
name=str(jpg)
img.save(name) #saves image to folder
return(img)

This function returns the fully processed image, and saves said image to folder.

Data conversion

Now to turn our images into our Image arrays and labels arrays. Following code breaks processes each image with image_processor, converts those processed images into image_arrays and label_arrays , and breaks data into training and testing sets.

from os import walk
import shutil
# first we need file path to tiger photos, bike photos, and images
#the folder containing both
tiger_path='<insert _tiger file path>'
bicycle_path='<insert bike file path>'
image_path='<insert image file path>'


tiger_img=(_, _, filenames) = list(next(walk(tiger_path)))
tiger_img=tiger_img[2]

bicycle_img=(_, _, filenames) = list(next(walk(bicycle_path)))
bicycle_img=bicycle_img[2]
#gets list of all files names in tiger and bike folder respectivelydir_path = image_path + '/processed_images'

if os.path.exists(dir_path):
shutil.rmtree(dir_path)

os.chdir(image_path)
os.makedirs("processed_images")
#first deletes "processed_images" folder if present, then adds new #folder of that nameos.chdir(tiger_path)

A=0
for jpg in tiger_img:
A=A+1
label=str(A)+"_image_tiger"
print(label)
img=image_processor(jpg)
jpg_name = ("%s" % (label)) + ".jpg"
file_path = os.path.join( dir_path,jpg_name)
print(file_path)
img.save( file_path )
#sets tiger photos folder as directory, processes all photos inside #and saves them to processed images folder under name
#str(A)+"_image_tiger" where A is just index of photo
os.chdir(bicycle_path)

B=0
for jpg in bicycle_img:
B = B + 1
label = str(B)+"_image_bicycle"
img=image_processor(jpg)
jpg_name = ("%s" % (label)) + ".jpg"
file_path = os.path.join( dir_path,jpg_name)
img.save( file_path )
#Same for bikes

os.chdir(dir_path)
train_images=[]
train_labels=[]
test_images=[]
test_labels=[]

Bt=int(B*.7)
At=int(A*.7)
# sets processed images folder as new directory, sets limiting index #for training and testing set as .7*total index for bike and tiger #respectivelyfor i in range(1,Bt):
bike_img=str(i)+"_image_bicycle" +'.jpg'
bike_img = Image.open(bike_img)
bike_array=np.asarray(bike_img)
train_labels.append(0)
train_images.append(bike_array)
#converts processed training bike images to array, appends array
#to train_images, appends 0 to train labels since bikes have label 0
for i in range(1, At):
tiger_img = str(i)+"_image_tiger" + '.jpg'
tiger_img = Image.open(tiger_img)
tiger_array = np.asarray(tiger_img)
train_labels.append(1)
train_images.append(tiger_array)
#converts processed training tiger images to array, appends array
#to train_images, appends 1 to train labels since tigers have label #0

for i in range(Bt,B):
bike_img=str(i)+"_image_bicycle" +'.jpg'
bike_img = Image.open(bike_img)
bike_array=np.asarray(bike_img)
test_labels.append(0)
test_images.append(bike_array)

for i in range(At, A):
tiger_img = str(i)+"_image_tiger" + '.jpg'
tiger_img = Image.open(tiger_img)
tiger_array = np.asarray(tiger_img)
test_labels.append(1)
test_images.append(tiger_array)
# same for testingtrain_images=np.asarray(train_images)
train_labels=np.asarray(train_labels)
test_images=np.asarray(test_images)
test_labels=np.asarray(test_labels)
#creates array of the array liststrain_images = train_images / 255.0

test_images = test_images / 255.0
#scales value down to [0,1], to produce our final data

Now data is fully processed and ready to be used to train neural network

Fitting basic Neural network

Code below inputs our processed data into neural network. We input relevant data shape, desired number of classes and desired hyperparameters. Here hyperparameters are ‘relu’,’softmax’,’adam’,’sparse_categorical_crossentropy’,’epoch’. Epoch is just number of times model sees our data. Complexity of models is such that it can see data different times and learn different things from said data (unlike regression, which has same result when fed same data every-time). We’ll want to tune epoch such that model learns all it can from data but not so much it is overfitted. We’ll see this in a second.

from tensorflow import keras

model = keras.Sequential([
keras.layers.Flatten(input_shape=(130, 130)), # input layer (1)
#flatten changes shape from 130x130 tensor to 16900 vector keras.layers.Dense(2500, activation='relu'), # hidden layer (2)#Dense means all connected, 2500 is number of neurons, should be #fraction size of input layer here ~1/8

keras.layers.Dense(2, activation='softmax') # output layer (3)
#should have as many neurons as classes to predict
])


model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
#Compiles model using given hyperparametersmodel.fit(train_images, train_labels, epochs=5)#trains model on our data

This model outputs:


Epoch 1/5
8/8 [==============================] — 1s 129ms/step — loss: 0.6762 — accuracy: 0.4828
Epoch 2/5
8/8 [==============================] — 1s 133ms/step — loss: 0.6070 — accuracy: 0.6681
Epoch 3/5
8/8 [==============================] — 1s 131ms/step — loss: 0.5521 — accuracy: 0.7500
Epoch 4/5
8/8 [==============================] — 1s 132ms/step — loss: 0.5142 — accuracy: 0.7543
Epoch 5/5
8/8 [==============================] — 1s 131ms/step — loss: 0.4701 — accuracy: 0.8190
4/4 [==============================] — 0s 12ms/step — loss: 0.5468 — accuracy: 0.6765
test_acc= 0.6764705777168274

As we can see the accuracy is improving for every epoch we add, but we know the accuracy must peak at some epoch . We can iterate through epochs to see where we get best testing accuracy, running following:

import matplotlib.pyplot as pltaccuracy_test=[]
accuracy_train=[]

for i in range(25):
model.compile(optimizer='adam',
loss='sparse_categorical_crossentropy',
metrics=['accuracy'])
model.fit(train_images, train_labels, epochs=i)
test_loss, test_acc = model.evaluate(test_images, test_labels, . verbose=1)
accuracy_test.append(test_acc)
train_loss,train_acc=model.evaluate(train_images,
train_labels,verbose=1)
accuracy_train.append(train_acc)
#runs model with 1 to 25 epochs, gets test and train accuracy for each and appends to 2 listsimport matplotlib.pyplot as plt
plt.plot(accuracy_test)
plt.plot( accuracy_train)
plt.title('accuracy v epoch')
plt.ylabel('accuracy')
plt.xlabel('epoch')
plt.legend(['Validation','Train'], loc='upper left')
plt.show()
#plots training and testing accuracy per epoch

Executing code returns following plot:

Accuracy on testing set peaks at around .85 for 12 epochs, falls after, until 16, after which all is static since model has fully memorized the data.

Can we do better? Naive approach might be just to increase resolution. Higher detail=more info=better prediction, Right? Unfortunately its not so easy.

More pixels doesn’t always translate to more accuracy, but it does always translate too much higher computation times. A smarter way might be try a more advanced model like a convolution neural network, or add layers, play around with other hyperparameters(hyperparameter tuning).

Sources:

--

--