Deep Learning with FAST AI engine makes it Speed Learning

Parth
5 min readMay 29, 2019

--

I have known about Jeremy Howard’s courses for a year or so now but this year finally decided to take a deep dive into FAST.AI’s Deep Learning course. Just from the first lecture so far, must say, hands on “down and dirty” coding approach has me hooked to the course already. It’s totally opposite from the way a Com Sci course is taught at universities where you learn theory first and then coding. By learning to get into code directly, I found understanding theory became much easier.

First lecture focuses on image classification, the famous hot-dog, no hot-dog app from HBO’s popular show Silicone Valley.

In terms of notes of the lecture, there are couple of great resources already that are detailed so I don’t want to re-invent the wheel.

  1. https://github.com/hiromis/notes/blob/master/Lesson1.md
  2. https://forums.fast.ai/t/deep-learning-lesson-1-notes/27748

The objectives of this lecture are to a) know how to gather data, in this case, images and b) build own image classifier.

Method highlighted in the python notebook provided by the course step by step shows how to create folders for the things that you want to download and how to download those things from google images.

Being a sport enthusiast, I decided to build image classifier that can detect between Wrestling, Rugby and American Football. To make it more challenging, I decided to take the pictures that were in tackling motion across the 3 sports to see how well algorithm can classify. Below is a batch of few example images that were collected.

data.show_batch(rows = 3, figsize=(7,8))

Once the images were collected, they were fed into a convolutional neural network with resnet34 as the architecture. Data is split into train, validation and test sets.

learn = create_cnn(data, models.resnet34, metrics=error_rate)
learn.fit_one_cycle(4)

Resnet34 architecture is basically a function like y = mx + b. By itself, it doesn’t mean anything but we know m is the slope and b is the y-intercept and provided data we can fit a line using that equation. Think of it same scenario except now you have images and are trying to fit model to it that recognizes those images and tells you what’s in image (ie wrestling, rugby or football). Resnet34 model is a pre-trained model that downloads weights for the images that it has learned to recognize from the ImageNet dataset. In short, instead of starting from scratch, we’re starting with a model that knows what image recognition is. This will help us make create and test our model significantly faster. 2nd line in the code above shows the entire dataset 4 times to our model so it learns to recognize between football, rugby and wrestling. Looking at the error rate below, we can already see our model is predicting with ~90% accuracy between the three sports. Not a bad place to start with.

9.99% error rate on the validation set

A term called, Learning Rate is an important term when it comes to improving neural net. In our example, let us plot learning rate to see how our model can be further improved.

learn.lr_find()
learn.recorder.plot()

When looking at the plot of the learning rate, we want to look for slope where it’s decreasing the most meaning where it’s minimizing the loss the most. In the case above we see from 1e-5 to 1e-4. So by running the model again with that learning rate parameter, we see slight improvement in the model to 8.1% error rate.

learn.fit_one_cycle(2, max_lr=slice(3e-5,3e-4))

To understand where our model might be making mistakes, let’s bring the images where our model was confident about the prediction but it got wrong.

interp = ClassificationInterpretation.from_learner(learn)
losses,idxs = interp.top_losses()
len(data.valid_ds)==len(losses)==len(idxs)
interp.plot_top_losses(9, figsize=(15,11))

All we are saying above is create this Classification Interpretation object that knows 2 things; data and model and loss function shows how good the model is.

Screenshot above shows us what our model predicted, what it actually was, and how confident it was about that prediction depicted by the loss. Looking at the results, we see couple of issues.

  1. Data quality: the first pic in first row, we see that it predicted football and it is of football but actually since the image was in rugby folder hence it has rugby as the real value.
  2. Non relevant data: we see couple of images that are neither of the 3 sports we are trying to predict.

In conclusion, even with some bad data, our model was able to predict with ~92% accuracy which of the 3 sports an image shows. Considering how tackling motion is similar in all 3 sports, I found it very interesting that model figured out which one is which.

It was great to learn within a single lecture how to collect images for analysis and how to build conv neural nets and get accuracy of 90+%.

To continue on to this application of image recognition, I plan to pursue a different use case with more than 200 images per class.

To those of you looking to start or get into Deep Learning, don’t wait, just start. As Drake says, “Better late than never, but never late is better.”

--

--

Parth

Data Scientist, Technology Enthusiast, interested in learning how data and technology can be used for social good