Creating a PyTorch Image Classifier

Anne Bonner
Dec 19, 2018 · 15 min read
Photo by Joshua Sortino on Unsplash

How on earth do I build an image classifier in PyTorch?

“Going forward, AI algorithms will be incorporated into more and more everyday applications. For example, you might want to include an image classifier in a smartphone app. To do this, you’d use a deep learning model trained on hundreds of thousands of images as part of the overall application architecture. A large part of software development in the future will be using these types of models as common parts of applications.”

-Udacity/Facebook AI PyTorch Deep Learning Final Project

This article will take you through the basics of creating an image classifier with PyTorch that can recognize different species of flowers. You can imagine using something like this in a phone app that tells you the name of the flower your camera is looking at. In practice, you would train this classifier, then export it for use in your application.

One of the most exciting parts of being involved in the Facebook AI PyTorch Scholarship Challenge has been the opportunity to build an image classifier for the final challenge. Given some basic guidelines, our goal is to build the most accurate classifier that we can by using the flower data set provided by Udacity.

New to both PyTorch and neural networks, this was a huge challenge! I put this article together for anyone out there who’s brand new to all of this and looking for a place to begin. It’s up to you to take this information, improve on it, and make it your own! *Because my experience with PyTorch has been through the Udacity challenge, my code draws heavily on the code we learned throughout the challenge course. This code, in turn, draws heavily on the official PyTorch documentation.*

Udacity provides a good starting point with both the flower data set and a .json file that provides a handy way to apply category names, but much of the rest of the project is up to us. For the challenge course, we were provided with a folder of training images and a folder of validation images to use for our programs. It is up to us whether to use these images as a train and test set or whether we want to split one of our folders so that we can use “train,” “test,” and “validation” sets.

In this project, there is a separate folder for each of the 102 flower classes. Each flower is labeled as a number and each of the numbered directories holds a number of .jpg files.

Let’s get started!

Photo by Annie Spratt on Unsplash

Because this is a neural network using a larger dataset than my cpu could handle in any reasonable amount of time, I went ahead and set up my image classifier in Google Colab. Colab is truly awesome because it provides free GPU! (If you’re new to Google Colab, check out this article!)

Because I was using Colab, I needed to start by importing PyTorch. You don’t need to do this unless you’re using Colab.)

*** UPDATE! (01/29)*** Colab now supports native PyTorch!!! You shouldn’t need to run the code below, but I’m leaving it up just in case anyone is having any issues!

#Import PyTorch if using Google Colab
from os.path import exists
from wheel.pep425tags import get_abbr_impl, get_impl_ver, get_abi_tag
platform = '{}{}-{}'.format(get_abbr_impl(), get_impl_ver(), get_abi_tag())
cuda_output = !ldconfig -p|grep|sed -e 's/.*\.\([0-9]*\)\.\([0-9]*\)$/cu\1\2/'
accelerator = cuda_output[0] if exists('/dev/nvidia0') else 'cpu'
!pip install -q{accelerator}/torch-0.4.1-{platform}-linux_x86_64.whl torchvision
import torch

Then, after having some trouble with Pillow (it’s buggy in Colab!), I just went ahead and ran this:

import PIL

If you get anything below 5.3.0, use the dropdown menu under “Runtime” to “Restart runtime” and run this cell again. You should be good to go!

You’ll want to be using GPU for this project, which is incredibly simple to set up on Colab. You just go to the “runtime” dropdown menu, select “change runtime type” and then select “GPU” in the hardware accelerator drop-down menu!

Then I like to run

# check if GPU is available
train_on_gpu = torch.cuda.is_available()
if not train_on_gpu:
print('Bummer! Training on CPU ...')
print('You are good to go! Training on GPU ...')

just to make sure it’s working. Then I run

device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")

to define the device.

After this, I imported the files. There are a ton of ways to do this, including mounting your Google Drive if you have your dataset stored there, which is actually really simple! Even though I didn’t wind up finding that to be the most useful solution, I’m including that below, just because it’s so easy and useful.

from google.colab import drive

Then you’ll see a link, click on that, allow access, copy the code that pops up, paste it in the box, hit enter, and you’re good to go! If you don’t see your drive in the side box on the left, just hit “refresh” and it should show up.

(Run the cell, click the link, copy the code on the page, paste it in the box, hit enter, and you’ll see this when you’ve successfully mounted your drive):

It’s actually super easy!

However, if you’d rather download a shared zip file link (this wound up being easier and faster for this project), you can use:


For example:

!wget -cq
!unzip -qq

That will give you Udacity’s flower data set in seconds!

(If you’re uploading small files, you can just upload them directly with some simple code. However, if you want to, you can also just go to the left side of the screen and click “upload files” if you don’t feel like running some simple code to grab a local file.)

After loading the data, I imported the libraries I wanted to use:

# Import resources
%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import time
import json
import copy
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
import PIL
from PIL import Image
from collections import OrderedDict
import torch
from torch import nn, optim
from torch.optim import lr_scheduler
from torch.autograd import Variable
import torchvision
from torchvision import datasets, models, transforms
from import SubsetRandomSampler
import torch.nn as nn
import torch.nn.functional as F

*Now here’s where I’m going to start to leave out specific choices that I’ve made in order to help you make your own decisions. Be creative! Enter your choices anywhere I’ve added “YOUR CHOICE” and have some fun!

Next comes the data transformations! You want to make sure to use several different types of transformations on your training set in order to help your program learn as much as it can. You want to create a more robust model by training it on flipped, rotated, and cropped images. (The means and standard deviations are provided to normalize the image values before passing them to our network, but they can also be found by looking at the mean and standard deviation values of the different dimensions of the image tensors.) The official documentation is incredibly helpful here! For my image classifier, I kept it simple with:

data_transforms = {
'train': transforms.Compose([
transforms.RandomRotation(YOUR CHOICE),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
'valid': transforms.Compose([
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
# TODO: Load the datasets with ImageFolder
image_datasets = {x: datasets.ImageFolder(os.path.join(data_dir, x),
for x in ['train', 'valid']}
# TODO: Using the image datasets and the trainforms, define the dataloaders
batch_size = YOUR CHOICE
dataloaders = {x:[x], batch_size=batch_size,
shuffle=True, num_workers=4)
for x in ['train', 'valid']}
class_names = image_datasets['train'].classesdataset_sizes = {x: len(image_datasets[x]) for x in ['train', 'valid']}
class_names = image_datasets['train'].classes

As you can see above, I also defined the batch size, data loaders, and class names in the code above.

To take a very quick look at the data and check my device, I ran:

{'train': 6552, 'valid': 818}

Next, we need to do some mapping from the label number and the actual flower name. Udacity provided a json file for this mapping to be done simply.

# Label mapping
with open('cat_to_name.json', 'r') as f:
cat_to_name = json.load(f)

It’s a good idea to test the data loader, so I ran:

images, labels = next(iter(dataloaders['train']))
images, labels = next(iter(dataloaders['train']))
rand_idx = np.random.randint(len(images))
# print(rand_idx)
print("label: {}, class: {}, name: {}".format(labels[rand_idx].item(),

Now it starts to get even more exciting! A number of models in the last several years have been created by people far, far more qualified than most of us for reuse in computer vision problems. PyTorch makes it easy to load pre-trained models and build on them, which is exactly what we’re going to do for this project. The choice of model is entirely up to you!

Some of the most popular pre-trained models, ResNet, AlexNet, and VGG come from the ImageNet Challenge. These pre-trained models allow others to quickly obtain cutting-edge results in computer vision without needing such large amounts of computer power, patience, and time. I actually had great results with DenseNet, and decided to use DenseNet161, which gave me very good results relatively quickly! (Please don’t just use this because it worked for me! I’m including this as an example only.

You can quickly set this up by running

model = models.densenet161(pretrained=True)

but it might be more interesting to give yourself a choice of model, optimizer, and scheduler. In order to set up a choice in architecture, I ran

model_name = 'densenet' #vgg
if model_name == 'densenet':
model = models.densenet161(pretrained=True)
num_in_features = 2208
elif model_name == 'vgg':
model = models.vgg19(pretrained=True)
num_in_features = 25088
print("Unknown model, please choose 'densenet' or 'vgg'")

which allows you to quickly set up an alternate model.

After that, you can start to build your classifier, using the parameters that work best for you. I went ahead and built

# Create classifier
for param in model.parameters():
param.requires_grad = False
def build_classifier(num_in_features, hidden_layers, num_out_features):

classifier = nn.Sequential()
if hidden_layers == None:
classifier.add_module('fc0', nn.Linear(num_in_features, 102))
layer_sizes = zip(hidden_layers[:-1], hidden_layers[1:])
classifier.add_module('fc0', nn.Linear(num_in_features, hidden_layers[0]))
classifier.add_module('relu0', nn.ReLU())
classifier.add_module('drop0', nn.Dropout(YOUR CHOICE))
YOUR CHOICE: how many layers do you want? for i, (h1, h2) in enumerate(layer_sizes):
classifier.add_module('fc'+str(i+1), nn.Linear(h1, h2))
classifier.add_module('relu'+str(i+1), nn.ReLU())
classifier.add_module('drop'+str(i+1), nn.Dropout(.5))
classifier.add_module('output', nn.Linear(hidden_layers[-1], num_out_features))

return classifier

which allows for an easy way to change the number of hidden layers that I’m using, as well as quickly adjusting the dropout rate. You may decide to add additional relu and dropout layers in order to more finely hone your model (hint, hint...).

Next, you want to work on training your classifier parameters. I decided to make sure I only trained the classifier parameters here while having feature parameters frozen. You can get as creative as you want here with your optimizer, criterion, and scheduler. The criterion is the method used to evaluate the model fit, the optimizer is the optimization method used to update the weights, and the scheduler provides different methods for adjusting the learning rate and step size used during optimization. You may not even want to specify a scheduler!

Try as many options and combinations as you can to see what gives you the best result. You can see all of the official documentation here. I recommend taking a long look at it and making your own decisions about what you want to use! You don’t literally have an infinite number of options here, but it sure feels like it!

hidden_layers = YOUR CHOICEclassifier = build_classifier(num_in_features, hidden_layers, 102)
if model_name == 'densenet':
model.classifier = classifier
criterion = YOUR CHOICE
optimizer = YOUR CHOICE
elif model_name == 'vgg':
model.classifier = classifier
criterion = YOUR CHOICE
optimizer = YOUR CHOICE)

Now it’s time to train your model!

# Adapted from train_model(model, criterion, optimizer, sched, num_epochs=YOUR CHOICE):
since = time.time()
best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0
for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch+1, num_epochs))
print('-' * 10)
# Each epoch has a training and validation phase
for phase in ['train', 'valid']:
if phase == 'train':
model.train() # Set model to training mode
model.eval() # Set model to evaluate mode
running_loss = 0.0
running_corrects = 0
# Iterate over data.
for inputs, labels in dataloaders[phase]:
inputs =
labels =
# zero the parameter gradients
# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)
# backward + optimize only if in training phase
if phase == 'train':
# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds ==
epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects.double() / dataset_sizes[phase]
print('{} Loss: {:.4f} Acc: {:.4f}'.format(
phase, epoch_loss, epoch_acc))
# deep copy the model
if phase == 'valid' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())
print()time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))
#load best model weights

return model
epochs = YOUR CHOICE
model = train_model(model, criterion, optimizer, sched, epochs)

I wanted to be able to monitor my epochs easily and also keep track of the time elapsed as my model was running. The code above allows both, and the results are pretty good! You can see that the model is quickly learning and the accuracy on the validation set quickly reached over 96% by epoch 6!

train Loss: 2.4975 Acc: 0.4737
valid Loss: 1.0240 Acc: 0.7714

Epoch 2/
train Loss: 0.8445 Acc: 0.8364
valid Loss: 0.4739 Acc: 0.9071

Epoch 3/
train Loss: 0.5265 Acc: 0.8915
valid Loss: 0.3285 Acc: 0.9218

Epoch 4/
train Loss: 0.4149 Acc: 0.9061
valid Loss: 0.2538 Acc: 0.9413

Epoch 5/
train Loss: 0.3424 Acc: 0.9243
valid Loss: 0.2326 Acc: 0.9462

Epoch 6/
train Loss: 0.2954 Acc: 0.9313
valid Loss: 0.1869 Acc: 0.9621


Training complete in 67m 43s
Best val Acc: 0.973105

Running through the number of epochs I chose using the parameters I decided to use takes just over an hour and the accuracy is looking good!

Now it’s time for evaluation, so you’ll want to run validation on your test (valid) set:

model.eval()accuracy = 0for inputs, labels in dataloaders['valid']:
inputs, labels =,
outputs = model(inputs)

# Class with the highest probability is our predicted class
equality = ( == outputs.max(1)[1])
# Accuracy is number of correct predictions divided by all predictions
accuracy += equality.type_as(torch.FloatTensor()).mean()

print("Test accuracy: {:.3f}".format(accuracy/len(dataloaders['valid'])))

Now it’s important to save your checkpoint.

model.class_to_idx = image_datasets['train'].class_to_idxcheckpoint = {'input_size': 2208,
'output_size': 102,
'epochs': epochs,
'batch_size': YOUR CHOICE,
'model': models.densenet161(pretrained=True),
'classifier': classifier,
'scheduler': sched,
'optimizer': optimizer.state_dict(),
'state_dict': model.state_dict(),
'class_to_idx': model.class_to_idx
}, 'checkpoint.pth')

You don’t have to save all of the parameters, but I’m including them here as an example. This checkpoint specifically saves the model with a pre-trained densenet161 architecture, but if you want to save your checkpoint with the two-choice option, you can absolutely do that. Simply adjust the input size and model.

Then you can work on loading your checkpoint. You can check your keys by running

ckpt = torch.load('checkpoint.pth')

If you’re importing your model into the Udacity workspace, this is the point at which you’ll want to import your libraries and also make sure to define your image size if you haven’t! Enter your imports and also

image_size = 224
# Values you used for normalizing the images. Default here are for
# pretrained models from torchvision.
norm_mean = [0.485, 0.456, 0.406]
norm_std = [0.229, 0.224, 0.225]

Now load and rebuild your model!

def load_checkpoint(filepath):
checkpoint = torch.load(filepath)
model = checkpoint['model']
model.classifier = checkpoint['classifier']
model.class_to_idx = checkpoint['class_to_idx']
optimizer = checkpoint['optimizer']
epochs = checkpoint['epochs']

for param in model.parameters():
param.requires_grad = False

return model, checkpoint['class_to_idx']
model, class_to_idx = load_checkpoint('checkpoint.pth')

If you’re loading your checkpoint in the actual Udacity workspace, you’ll need to make a few changes here. You want to put everything in one cell and make a few minor changes. You’ll need something that looks like this:

image_size = 224
# Values you used for normalizing the images. Default here are for
# pretrained models from torchvision.
norm_mean = [0.485, 0.456, 0.406]
norm_std = [0.229, 0.224, 0.225]
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")# TODO: Write a function that loads a checkpoint and rebuilds the model
def load_checkpoint(filepath, map_location='cpu'):
checkpoint = torch.load(filepath)
model = checkpoint['model']
model.classifier = checkpoint['classifier']
model.load_state_dict(checkpoint['state_dict'], strict=False)
model.class_to_idx = checkpoint['class_to_idx']
optimizer = checkpoint['optimizer']
epochs = checkpoint['epochs']

for param in model.parameters():
param.requires_grad = False

return model, checkpoint['class_to_idx']
model, class_to_idx = load_checkpoint('/home/workspace/checkpoint.pth')

This is important: do you see where it says





You definitely want to include those!

Not kidding. You need them in the Udacity workspace!

Also, make sure your path is correct. As you can see above, mine was

model, class_to_idx = load_checkpoint('/home/workspace/checkpoint.pth')

You really need to have the right path, or Udacity won’t be able to find and load your checkpoint.

Also, @Antje in our PyTorch challenge Slack group says, “ for me, it didn’t work until I used torch version 0.4.0 for training instead of 1.0.0. ”

Now test that code and get your project in! Our results are strictly pass/fail, so if you passed, you’re done! We have no way of knowing our accuracy (besides the unofficial leaderboards and self-reporting on Slack), which is maddening since it’s our accuracy that will determine whether or not we move onto phase 2!

Sit back, cross your fingers, and wait.

(Still having trouble? I wrote an article that just covers loading your checkpoint to Udacity.)

Want to keep going in your own notebook? It’s a good idea to do some image preprocessing and inference for classification! Go ahead and define your image path and open an image:

image_path = 'flower_data/valid/102/image_08006.jpg'
img =

Process your image and take a look at a processed image:

def process_image(image):
''' Scales, crops, and normalizes a PIL image for a PyTorch model,
returns an Numpy array
# TODO: Process a PIL image for use in a PyTorch model
# tensor.numpy().transpose(1, 2, 0)
preprocess = transforms.Compose([
transforms.Normalize(mean=[0.485, 0.456, 0.406],
std=[0.229, 0.224, 0.225])
image = preprocess(image)
return image
def imshow(image, ax=None, title=None):
"""Imshow for Tensor."""
if ax is None:
fig, ax = plt.subplots()

# PyTorch tensors assume the color channel is the first dimension
# but matplotlib assumes is the third dimension
image = image.numpy().transpose((1, 2, 0))

# Undo preprocessing
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
image = std * image + mean

# Image needs to be clipped between 0 and 1 or it looks like noise when displayed
image = np.clip(image, 0, 1)


return ax
with'flower_data/valid/102/image_08006.jpg') as image:
model.class_to_idx = image_datasets['train'].class_to_idx

Create a function for prediction:

def predict2(image_path, model, topk=5):
''' Predict the class (or classes) of an image using a trained deep learning model.

# TODO: Implement the code to predict the class from an image file
img =
img = process_image(img)

# Convert 2D image to 1D vector
img = np.expand_dims(img, 0)

img = torch.from_numpy(img)

inputs = Variable(img).to(device)
logits = model.forward(inputs)

ps = F.softmax(logits,dim=1)
topk = ps.cpu().topk(topk)

return ( for e in topk)

Once you can get images in the correct format, you’ll write a function for making predictions with your model. One common practice is to predict the top 5 or so (usually called top-KK) most probable classes. You’ll want to calculate the class probabilities then find the KK largest values.

To get the top KK largest values in a tensor use k.topk(). This method returns both the highest k probabilities and the indices of those probabilities corresponding to the classes. You need to convert from these indices to the actual class labels using class_to_idx, which you added to the model or from the Image Folder you used to load the data. Make sure to invert the dictionary so you get a mapping from index to class as well.

This method should take a path to an image and a model checkpoint, then return the probabilities and classes.

img_path = 'flower_data/valid/18/image_04252.jpg'
probs, classes = predict2(img_path,
flower_names = [cat_to_name[class_names[e]] for e in classes]

I was pretty pleased with how my model performed!

[0.9999195337295532, 1.4087702766119037e-05, 1.3897360986447893e-05, 1.1400215043977369e-05, 6.098791800468462e-06]
[12, 86, 7, 88, 40]
['peruvian lily', 'desert-rose', 'king protea', 'magnolia', 'sword lily']

Basically, it’s nearly 100% likely that the image I specified is a Peruvian Lily! Want to take a look? Try using matplotlib to plot the probabilities for the top five classes in a bar graph along with the input image:

def view_classify(img_path, prob, classes, mapping):
''' Function for viewing an image and it's predicted classes.
image =
fig, (ax1, ax2) = plt.subplots(figsize=(6,10), ncols=1, nrows=2)
flower_name = mapping[img_path.split('/')[-2]]

y_pos = np.arange(len(prob))
ax2.barh(y_pos, prob, align='center')
ax2.invert_yaxis() # labels read top-to-bottom
ax2.set_title('Class Probability')
view_classify(img_path, probs, classes, cat_to_name)

You should see something like this:

I’ve got to say, I’m pretty happy with that! I recommend testing a few other images to see how close your predictions are on a variety of images.

Now it’s time to make a model of your own and let me know how it goes!

Photo by Pez González on Unsplash

Have you finished a deep learning or machine learning model, but you don’t know what to do with it next?

Why not deploy it to the internet?

Get your model out there so everyone can see it!

Check out this article to learn how to deploy your machine learning model with Flask!

Data Driven Investor

from confusion to clarity, not insanity

Anne Bonner

Written by

Secretly believes everyone should get involved with tech. Wants to help with that. Let’s connect on LinkedIn!

Data Driven Investor

from confusion to clarity, not insanity

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade