Deep Learning With PyTorch

Josh Bernhard
20 min readJun 22, 2018

--

Introduction to the Project

In this post I will show how to build a deep learning network to identify 102 different types of flowers. In this post, I will walk through how I used PyTorch to complete this project.

As I was new to creating deep learning models with PyTorch, I hope this post can help others who are using this deep learning library for the first time.

Before we get too far along, let’s take a look at the data we are working with, as well as how the folder structure was set up for this project. The folder structure used in this situation is how you should set up your folders for essentially every PyTorch model you create.

Depending on the type of project and amount of data, you will want either two groups:

  1. Train
  2. Test

Or you may want three groups:

  1. Train
  2. Validation
  3. Test

Within each directory, you will want a separate folder for each class. In the case of the project, this meant a separate folder for each of the 102 flower classes. Each flower was labeled as a number. Below is the folder structure, where each of the numbered directories holds a number of .jpg files.

Image 1: Folder Structure

For each of the flower types, the training dataset had between 27–206 images, the validation dataset had between 1–28 images, and the testing dataset had between 2–28 images.

Original images were obtained from the 102 Category Flower Dataset, and they are described in the following way by the authors Maria-Elena Nilsback and Andrew Zisserman:

The images have large scale, pose and light variations. In addition, there are categories that have large variations within the category and several very similar categories. The dataset is visualized using isomap with shape and colour features.

In order to obtain the counts of how many images were in each category, the following shell command proved to be extremely useful:

for dir in ./*/
do
ls ./${dir} -l . | egrep -c '^-'
done

Let’s take a look at a few of the different types of flowers to get an idea of the variability within and between flower types.

Between classes:

Image 2: Three of the 102 different flower classes

Within classes:

Image 3: Each of these three images is a globe thistle

You can have a look at all of the flowers, as well as how many total are in the dataset at the link here.

Getting Started With PyTorch

Table of Contents

I. Loading Data
II. Model Training
III. Model Testing
IV. Save Model Checkpoint
V. Load Model Checkpoint
VI. Processing Images
VII. Class Prediction
VIII. Sanity Check
IX. Final Thoughts
X. Special Thanks

Loading Data

Now that we are familiar with the dataset and folder structure, we can get started with PyTorch. When reading in the data, PyTorch does so using generators. Image data tends to create large files, so you likely do not want to store this data in memory, but instead generate on the fly.

Using the documentation provides some really useful tips for setting up your data for PyTorch. First, let’s get our libraries all set up. Pulling from an earlier section of the Nanodegree program, we can get a good place to start. We can update this later as we find new libraries that are needed.

%matplotlib inline
%config InlineBackend.figure_format = 'retina'
import time
import json
import copy
import matplotlib.pyplot as plt
import seaborn as sns
import numpy as np
from PIL import Image
from collections import OrderedDict
import torch
from torch import nn, optim
from torch.optim import lr_scheduler
from torch.autograd import Variable
from torchvision import datasets, models, transforms

Next, it is useful to set up your data_transformations. Essentially, the data transformations in PyTorch allow us to train on many variations of the original images that are cropped in different ways or rotated in different ways.

As you work through the PyTorch portion of the Nanodegree program, you learn a few different ways to do this. However, I found that the documentation was particularly useful for the method I implemented.

First, we want to set up the data transformations for each set of data. In general, we want to have the same types of transformations on the validation and test sets of data. However, with the training data, we can create a more robust model by training it on rotated, flipped, and cropped images.

Though the means and standard deviations are provided to normalize the image values before passing to our network, they could also be found by looking at the mean and standard deviation values of the different dimensions of the image tensors.

data_transforms = {
'train': transforms.Compose([
transforms.RandomRotation(45),
transforms.RandomResizedCrop(224),
transforms.RandomHorizontalFlip(),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
]),
'valid': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
]),
'test': transforms.Compose([
transforms.Resize(256),
transforms.CenterCrop(224),
transforms.ToTensor(),
transforms.Normalize([0.485, 0.456, 0.406],
[0.229, 0.224, 0.225])
]),
}

Using ImageFolder and DataLoader allows for the ability to easily pass the images through the necessary transformations and then through our network for training or prediction. The code below can easily be reduced or extended to new examples as long as the folder structure provided in the first part of this post is maintained.

train_dir = 'train'
valid_dir = 'valid'
test_dir = 'test'
dirs = {'train': train_dir,
'valid': valid_dir,
'test' : test_dir}
image_datasets = {x: datasets.ImageFolder(dirs[x], transform=data_transforms[x]) for x in ['train', 'valid', 'test']}dataloaders = {x: torch.utils.data.DataLoader(image_datasets[x], batch_size=32, shuffle=True) for x in ['train', 'valid', 'test']}dataset_sizes = {x: len(image_datasets[x])
for x in ['train', 'valid', 'test']}
class_names = image_datasets['train'].classes

You will see each of the created variables used throughout the next parts when creating and training our network.

As a final step of setting up our data, we needed to create a mapping from the label number (some number between 1 and 102) and the actual flower name. Udacity provided a json file for this mapping to be done seamlessly.

with open('cat_to_name.json', 'r') as f:
label_map = json.load(f)

Transfer Learning

In recent years, a number of models have been created for reuse in computer vision problems. Using these pre-trained models is known as transfer learning. PyTorch makes it easy to load pre-trained models and build upon them, which is what we will do in this project.

Some of the most popular pre-trained models include VGGNet, ResNet, and AlexNet, all of which are pre-trained models from the ImageNet Challenge. These pre-trained models allow others to quickly obtain cutting edge results in computer vision without needing the large amounts of compute power, time, and patience in finding the right training technique to optimize the weights.

I decided to use the vgg19 architecture, which we can obtain from the torchvision library. However, other models could have been easily used with very similar setup.

model = models.vgg19(pretrained=True)

You can even see the architecture of the model by simply running it:

model

If you scroll through the model architecture, you will notice that it has a number of convolutional layers. Below you can see the final layers of the network.

There are two parts of the classifier that must be consistent

  1. The number of input features on line (0) with a value of 25088 must also be the same as our first input layer.
  2. The number of output features in line (6) , which is currently 1000 features, must equal the number of classes we have in our dataset. In our case, we will want to predict the 102 flower classes.

In order to create final layers that are consistent with our problem, there were two methods that were shown in the program: create a class or use an ordered dictionary. I preferred the second, so the ending layers of my network were created with the following:

classifier = nn.Sequential(OrderedDict([
('fc1', nn.Linear(25088, 4096)),
('relu', nn.ReLU()),
('fc2', nn.Linear(4096, 102)),
('output', nn.LogSoftmax(dim=1))
]))

Then the below ensures we don’t update the weights from the pre-trained model.

for param in model.parameters():
param.requires_grad = False

We can replace the classifier portion of vgg19 by using:

model.classifier = classifier

Model Training

Now that we have our model all set up, we will want to train the final layers. We also want to get an idea of how well it is working! From the same documentation as earlier, we can find a function for training our models. The function shown here is taken nearly verbatim from the documentation.

def train_model(model, criteria, optimizer, scheduler,    
num_epochs=25, device='cuda'):
since = time.time()

best_model_wts = copy.deepcopy(model.state_dict())
best_acc = 0.0

for epoch in range(num_epochs):
print('Epoch {}/{}'.format(epoch, num_epochs - 1))
print('-' * 10)

# Each epoch has a training and validation phase
for phase in ['train', 'valid']:
if phase == 'train':
scheduler.step()
model.train() # Set model to training mode
else:
model.eval() # Set model to evaluate mode

running_loss = 0.0
running_corrects = 0

# Iterate over data.
for inputs, labels in dataloaders[phase]:
inputs = inputs.to(device)
labels = labels.to(device)

# zero the parameter gradients
optimizer.zero_grad()

# forward
# track history if only in train
with torch.set_grad_enabled(phase == 'train'):
outputs = model(inputs)
_, preds = torch.max(outputs, 1)
loss = criterion(outputs, labels)

# backward + optimize only if in training phase
if phase == 'train':
loss.backward()
optimizer.step()

# statistics
running_loss += loss.item() * inputs.size(0)
running_corrects += torch.sum(preds == labels.data)

epoch_loss = running_loss / dataset_sizes[phase]
epoch_acc = running_corrects.double() / dataset_sizes[phase]

print('{} Loss: {:.4f} Acc: {:.4f}'.format(
phase, epoch_loss, epoch_acc))

# deep copy the model
if phase == 'valid' and epoch_acc > best_acc:
best_acc = epoch_acc
best_model_wts = copy.deepcopy(model.state_dict())

print()

time_elapsed = time.time() - since
print('Training complete in {:.0f}m {:.0f}s'.format(
time_elapsed // 60, time_elapsed % 60))
print('Best val Acc: {:4f}'.format(best_acc))

# load best model weights
model.load_state_dict(best_model_wts)
return model

Let’s set up the necessary inputs to this function. There are six inputs to the model:

  1. The first argument is model, which is just the model we created in the previous portion.
  2. The second argument criteria is the method used to evaluate the model fit.
  3. The optimizer is the optimization technique used to update the weights.
  4. The scheduler provides different methods for adjusting the learning rate and step size used during optimization.
  5. The epoch, as described in the lessons is full run of feedforward and backpropagation through the network.
  6. The device was set to 'cuda' as default, but then could also be set to 'cpu' if you wanted to train your model (for the rest of your life) on your local cpu.

To train my model, I set the above to the following values:

# Criteria NLLLoss which is recommended with Softmax final layer
criteria = nn.NLLLoss()
# Observe that all parameters are being optimized
optim = optim.Adam(model.classifier.parameters(), lr=0.001)
# Decay LR by a factor of 0.1 every 4 epochs
sched = lr_scheduler.StepLR(optimizer, step_size=4, gamma=0.1)
# Number of epochs
eps=10

Udacity provides time on their classroom GPUs to train your models, and the function I used to train my model was then specified as shown below.

model_ft = train_model(model, criteria, optim, sched, eps, 'cuda')

If everything is set up correctly, you should see something like the following:

This looks promising. The model appears to be learning with each epoch.

Additionally, it doesn’t appear that our model is overfitting (at least too much), since the training and validation metrics are not diverging too much.

I found that changing the number of epochs, the optimizer, and the scheduler had the greatest impact on my results.

Once I felt comfortable with the ending results, which had a validation accuracy of approximately 91% across all the flower types, I could do one final check of the accuracy on the test data.

Model Testing

Now that I had a model that I thought was pretty good, I wanted to test out how well it would work on the test data. I simplified the function from the PyTorch documentation, so that I could quickly obtain the accuracy for each test batch.

You will notice a few changes from earlier:

  1. We use no_grad, as we are not interested with training with this function.
  2. To obtain the output values, we do a single forward pass of our data using model.forward(inputs). Notice that outputs is a tensor with one dimension of 102 values for the LogSoftmax output values from the model for each image prediction, and another dimension of 32 values for each of the images in the batch. You can see this by using print(outputs) and print(outputs[0]).
  3. Then using the max method, we can get the image that is most likely. Specifically, you obtain the max LogSoftmax value for each image (stored in _) and the predicted image label (stored in predicted).
  4. Then we can look at if the predicted label matches the actual data label. The equals tensor holds a 1 if we predict correctly, and a 0 if we predict incorrectly.
  5. Finally, I computed the accuracy in each batch by changing the 1 and 0 values to float values and taking the mean.
def calc_accuracy(model, data, cuda=False):
model.eval()
model.to(device='cuda')

with torch.no_grad():
for idx, (inputs, labels) in enumerate(dataloaders[data]):
if cuda:
inputs, labels = inputs.cuda(), labels.cuda()
# obtain the outputs from the model
outputs = model.forward(inputs)
# max provides the (maximum probability, max value)
_, predicted = outputs.max(dim=1)
# check the
if idx == 0:
print(predicted) #the predicted class
print(torch.exp(_)) # the predicted probability
equals = predicted == labels.data
if idx == 0:
print(equals)
print(equals.float().mean())

In the above, I also threw in some print statements that helped me understand what was being done at each step. You can see the output from the print statements here:

With a batch size of 32 images, it took 26 batches to make a prediction for every test image. In the above image, you can see:

  1. The predicted flower type in the first tensor.
  2. The probability associated with how sure the model is that it has predicted the correct flower type.
  3. The truth of whether the model was actually correct or not.
  4. The accuracy for each batch.

The accuracy for the batches ranged from 0.75 to 1.0. The overall accuracy was approximately 0.93 across all batches.

Save Model Checkpoint

It is likely it took you quite some time to obtain the right balance of hyper-parameters, train your model, and get to this point. With that in mind, PyTorch allows you the ability to checkpoint your models, so you can easily pick up where you left off.

There are a few items of this code worth noting:

  1. We use torch.save to save our model for future reuse. It is common to store everything about our model in a dictionary within torch.save. The torch.save function takes two arguments: a dictionary and a path of where to save the dictionary.
  2. The first item in our dictionary is the transfer learning architecture I used, which is saved within the key arch. This isn’t really that important, because I will only be using this within an if statement when loading my model. If you had many base models, each with different weights, this would be much more important.
  3. The model.class_to_idx keeps track of our mapping of flower class values to the flower indices, where the indices are actually predicted by our model. The label_map can map the flower class back to the flower name.
label_map: class -> flower, class_to_idx: class -> pred

I found this confusing, so I printed a visual that will hopefully help you understand the relationships. The keys of each dictionary are the same.

If we don’t create this dictionary, we do not have an easy way to map back to the actual flower labels when we make predictions with this model in the future.

4. The model.state_dict() holds all of the weights and biases of our model for each layer in a dictionary. This is the key thing we will need back when want to load our model to use in the future!

5. All of this information is saved to a file. The extension on this file doesn’t seem to be very important to the community. I have seen .pth suggested by the creator of PyTorch, so I used it below. However, I have also seen .dat.

model.class_to_idx = image_datasets['train'].class_to_idx
model.cpu()
torch.save({'arch': 'vgg19',
'state_dict': model.state_dict(),
'class_to_idx': model.class_to_idx},
'classifier.pth')

Load Model Checkpoint

Now that you have saved your model, we need to be able to load the model. With this, there are again a few things to keep in mind (can you tell I like organized lists yet?):

  1. First, anytime you used transfer learning and a pre-trained model, you will need to load this model in the same way you did in your original will need to load the same pre-trained model.
  2. You will need to create a model with the same structure that you had originally, then you essentially use the loaded model state to provide the weights for each of layers of that model.
  3. In PyTorch, the torch module has a load function that makes it easy to load the state_dict you saved earlier.
def load_model(checkpoint_path):
chpt = torch.load(checkpoint_path)

if chpt['arch'] == 'vgg19':
model = models.vgg19(pretrained=True)
for param in model.parameters():
param.requires_grad = False
elif:
print("Sorry base architecture note recognized")
break

model.class_to_idx = chpt['class_to_idx']

# Create the classifier
classifier = nn.Sequential(OrderedDict([
('fc1', nn.Linear(25088, 4096)),
('relu', nn.ReLU()),
('fc2', nn.Linear(4096, 102)),
('output', nn.LogSoftmax(dim=1))
]))
# Put the classifier on the pretrained network
model.classifier = classifier

model.load_state_dict(chpt['state_dict'])

return model

If the load_model function worked, you should be able to run the below code to obtain similar results to those you obtained prior to saving your model.

model = load_model('classifier.pth')
calc_accuracy(model, 'test', True)

Notice that the last parts of the project are now aimed to allow us to use the model we just loaded to work with individual images without having to fire up a gpu. Therefore, some of the below might seem like… “wait, didn’t we already do this?”

Processing Images

I am someone who needs to understand why we are doing something, so this part was a bit frustrating at first for me. The idea for this part of the project is that you want to be able to pass an individual image to your deep learning network, and for your network to predict the label for the image.

PyTorch made this easy to do for the many images we had within our folder structure. However, for a single image, it would be ideal to pass a single path without the whole folder structure set up.

The directions say to use PIL to load the image. You can test this out with a any, single image to make sure you understand how to view any image:

from PIL import Imageimage_path = 'test/28/image_05230.jpg'
img = Image.open(image_path)

We then need to take the image and scale it the same way we scaled the test and validation images earlier in the project.

First, we needed to resize the images where the shortest side is 256 pixels. In the project, there are two methods provided as suggestions. This post was really helpful in understanding why I used thumbnail for my implementation. Using thumbnail meant that I could look at which of the (width, height) was smaller, then fix this value to 256 pixels. The other could then be as large as possible, as it would be constrained when using thumbnail. The same constraint is not true when using resize.

if img.size[0] > img.size[1]: (if the width > height)
img.thumbnail((1000000, 256) #constrain the height to be 256
else:
img.thumbnail((256, 200000)) #otherwise constrain the width

I used the arbitrary 1000000 and 200000 above, but really any large value in these positions will work.

Then you’ll need to crop out the center 224x224 portion of the image.

We only want the middle 224x224 portion

We can obtain this middle portion by using the crop method of the image, which takes a tuple holding the (left_margin, bottom_margin, right_margin, top_margin).

To get the left and bottom margins, I consider just taking the total width and height subtracting 224, and then dividing by two since we want the remaining amount to be split to be used equally on each margin.

left_margin = (img.width-224)/2
bottom_margin = (img.height-224)/2
right_margin = left_margin + 224
top_margin = bottom_margin + 224

We can then simply crop our image:

img = img.crop((left_margin, bottom_margin, right_margin,    
top_margin))

Color channels of images are typically encoded as integers 0–255, but the model expected floats 0–1. We can make this change by scaling by 255.

img = np.array(img)/255

Then we want to normalize in the way expected by our model, which means subtracting out the image mean values and dividing by the standard deviation in each direction.

mean = np.array([0.485, 0.456, 0.406]) #provided mean
std = np.array([0.229, 0.224, 0.225]) #provided std
img = (img - mean)/std

Finally, PyTorch expects the color channel to be the first dimension, but it is currently the third dimension of our images. Therefore, we will want to move the third index of our images to the first, and shift the other two indices.

img = img.transpose((2, 0, 1))

If you made it this far, then congrats!!! We made it through what I found to be the most grueling part of the project. You should be able to piece together the above to create your own function, which likely looks similar to mine below.

def process_image(image_path):
'''
Scales, crops, and normalizes a PIL image for a PyTorch
model, returns an Numpy array
'''
# Open the image
from PIL import Image
img = Image.open(image_path)
# Resize
if img.size[0] > img.size[1]:
img.thumbnail((10000, 256))
else:
img.thumbnail((256, 10000))
# Crop
left_margin = (img.width-224)/2
bottom_margin = (img.height-224)/2
right_margin = left_margin + 224
top_margin = bottom_margin + 224
img = img.crop((left_margin, bottom_margin, right_margin,
top_margin))
# Normalize
img = np.array(img)/255
mean = np.array([0.485, 0.456, 0.406]) #provided mean
std = np.array([0.229, 0.224, 0.225]) #provided std
img = (img - mean)/std

# Move color channels to first dimension as expected by PyTorch
img = img.transpose((2, 0, 1))

return img

If you were successful, then you should be able to use the Udacity function imshow, which is defined below with a slight modification to use the title argument.

def imshow(image, ax=None, title=None):
if ax is None:
fig, ax = plt.subplots()
if title:
plt.title(title)
# PyTorch tensors assume the color channel is first
# but matplotlib assumes is the third dimension
image = image.transpose((1, 2, 0))

# Undo preprocessing
mean = np.array([0.485, 0.456, 0.406])
std = np.array([0.229, 0.224, 0.225])
image = std * image + mean

# Image needs to be clipped between 0 and 1
image = np.clip(image, 0, 1)

ax.imshow(image)

return ax
Result of running the code below

Running your function on the image_path, and the corresponding Udacity function on the result (img as defined below) should return the image shown on a 224x224 grid.

Here is an example of a quick check you can run of the two functions.

image_path = 'test/28/image_05230.jpg'
img = process_image(image_path)
imshow(img)

If you are correct, you should get the image shown above.

Class Prediction

The next part of the project is to predict the probabilities and classes similar to what we did earlier. However this time, we want to write a function we can use without firing up a gpu and on the single image we returned using the above process_image function.

I mostly hoped that I would be able to pass the image (given I had just done all that work to put it in the right form) directly to my network. My first idea looked something like:

img = process_image(image_path) # make image pytorch compatible
log_results = model.forward(img) # get the log softmax values
probs = torch.exp(log_results) # exponentiate
top_probs, top_labs = probs.topk(5) # get the top 5 results

Unfortunately, this provides an error that tells us we need to change our numpy array to a PyTorch tensor. In order to perform this task, we have the following line:

img = torch.from_numpy(img).type(torch.FloatTensor)

I tried to run the new img through my model, but received an error.

RuntimeError: expected stride to be a single integer value or a list of 1 values to match the convolution dimensions, but got stride=[1, 1]

A quick Google search leads to this post. It tells us that the first argument when passing your image tensor to the model needs to be the batch size. In our case, we are passing only a single image. Therefore, it is suggested that we can use img.unsqueeze_(0) to add a 1 as the first argument of our tensor.

Now, the shape of our img is (1, 3, 224, 224), which corresponds to the (batch_size, rgb, width, height). The above code then changes to:

# Process image
img = process_image(image_path)
img = torch.from_numpy(img).type(torch.FloatTensor)
img.unsqueeze_(0)
# Predict top 5
probs = torch.exp(model.forward(img))
top_probs, top_labs = probs.topk(5)

As a final step, we need to use the indices we get back from topk to pull the flower classes. I attempted to loop directly through the tensor objects given back from probs.topk. However, I was thrown an error. Therefore, I wanted to change the tensor to a numpy array. I was thrown another error when attempting this task:

RuntimeError: Can't call numpy() on Variable that requires grad. Use var.detach().numpy() instead.

So, I followed the instructions, and I changed the tensors to numpy arrays using detach. Those later broke because they were not hashable, so I added tolist().

Honestly, this turned into a lot of trial and error until I finally got something that worked, and I am sure there is probably something that is more efficient.

— Me, Basically All The Time

top_probs = top_probs.detach().numpy().tolist()[0]
top_labs = top_labs.detach().numpy().tolist()[0]

From here, the mapping from earlier came in handy again.

label_map: class -> flower, class_to_idx: class -> pred

Since what we get back right now from the model is the value in model.class_to_idx, I wanted to switch the direction of the key, val to be able to index using what was provided back from the model to pull the class number.

We can flip the key, val for the dictionary with the following code:

idx_to_class = {val: key for key, val in model.class_to_idx.items()}

Then to pull the top classes we need to index the idx_to_class with the top_labs values, we perform the following code.

top_labels = [idx_to_class[lab] for lab in top_labs]

This provides all the necessary items to be returned, but I think it would be more useful to return the flower name. To get the flower name, we can run the following.

top_flowers = [label_map[idx_to_class[lab]] for lab in top_labs]

Putting all of the above together, I created the following predict function.

def predict(image_path, model, top_num=5):
# Process image
img = process_image(image_path)

# Numpy -> Tensor
image_tensor = torch.from_numpy(img).type(torch.FloatTensor)
# Add batch of size 1 to image
model_input = image_tensor.unsqueeze(0)

# Probs
probs = torch.exp(model.forward(model_input))

# Top probs
top_probs, top_labs = probs.topk(top_num)
top_probs = top_probs.detach().numpy().tolist()[0]
top_labs = top_labs.detach().numpy().tolist()[0]

# Convert indices to classes
idx_to_class = {val: key for key, val in
model.class_to_idx.items()}
top_labels = [idx_to_class[lab] for lab in top_labs]
top_flowers = [label_map[idx_to_class[lab]] for lab in top_labs]
return top_probs, top_labels, top_flowers

Sanity Check

For this last part, I basically just put all of the earlier parts together to create a plot_solution function together.

def plot_solution(image_path, model):
# Set up plot
plt.figure(figsize = (6,10))
ax = plt.subplot(2,1,1)
# Set up title
flower_num = image_path.split('/')[1]
title_ = label_map[flower_num]
# Plot flower
img = process_image(image_path)
imshow(img, ax, title = title_);
# Make prediction
probs, labs, flowers = predict(image_path, model)
# Plot bar chart
plt.subplot(2,1,2)
sns.barplot(x=probs, y=flowers, color=sns.color_palette()[0]);
plt.show()

With that, a quick look at two examples shows one example where my model predicted incorrectly. For this example, it was a little all over the place with its predictions. There was clearly some uncertainty.

image_path = 'test/28/image_05230.jpg'
plot_solution(image_path, model)
Incorrect prediction & quite a lot of uncertainty

Alternatively, for other predictions the model has no doubt…

image_path = 'test/1/image_06743.jpg'
plot_solution(image_path, model)
Correct prediction & absolute certainty

Final Thoughts

There were a few goals to this post:

  1. Show an applied example of working with PyTorch.
  2. Help people who are stuck with the Udacity project, which is used across a couple of nanodegrees at this point.
  3. Provide a view of how to trouble shoot when you are stuck in PyTorch (or just software engineering in general).
  4. Notice that the final parts allow you to load a pre-trained network and use it on a new image using only cpu.
  5. Hopefully get you excited about building your own deep learning applications.

Additionally, I would like to leave a few notes about trouble shooting and random items I learned about PyTorch while completing this project.

  1. The most difficult parts of performing deep learning in practice are around understanding data types and matrix dimensions. If you are getting an error, it is likely one of these two items that is truly the reason why (regardless of what your error message is in PyTorch).
  2. The documentation for PyTorch is not very mature. PyTorch does not have document strings for really anything. I wrote this post hoping to help others who live in software engineering by writing things like df.reshape? and reading document strings. The lack of help when wanting to referencing the documentation of an item was very frustrating for me.
  3. If you are not training your model, set your model as model.eval(). PyTorch is pretty cool in that you can setup when and which layers you want to train. When you are making predictions, make sure nothing is training.

Special Thanks

I would likely still be working on this project if it weren’t for a number of people who assisted me in writing this post. Thank you for responding quickly and patiently to my questions. Special thanks to Mat Leonard, Juno Lee, Mike Yi, Chris Gearhart, and Alexis Cook.

Stay audacious and humble.

--

--

Josh Bernhard

I communicate in a way that some people like and some don't. I like plaid. The views expressed here are my own.