How I Did Deep Learning With Little Data

satyabrata pal
ML and Automation
Published in
11 min readJan 14, 2020

Transfer Learning Using Fastai

Courtesy: Pexels.com

Three years ago I started my journey to learn about Machine Learning and Deep Learning in particular. Since then I have scourged through hundreds of articles and tutorials.

I have noticed that our thought towards deep learning has become limited to the following pipeline.

Take a run-of-the-mill data → Drop in a neural network → check the accuracy number → drop more data → train some more and so on.

This obsession with big data to train neural network made me wonder. What if there is a situation when we have small data? What if we don’t have access to more data?

This thought made me to explore ways which I can use in similar situation to train neural networks. This is my story about the approaches that I explored.

Let’s Get Started Then!

As is my usual drill, I import the necessary modules first.

from pathlib import Path
from fastai.vision import *
from fastai.metrics import error_rate

Ok! Where’s Your Data

Ok! I know what you want to know. You want to know what was the data that I had used for this project.

Well! I used the ancient knowledge of web page scrapping to scrape images from google image search.

I learned this technique in fastai lesson 2 and I used this knowledge to build my data as mentioned here in my previous article.

The dataset that I built for this project is available here at kaggle.

I have this teeny weeny function which does the following →

  • Takes in the root directory of to dataset.
  • Wraps it in the python Path() method.
  • Returns the sub-directories contained inside the root folder using the .ls() method form the pathlib module.
from pathlib import Pathdef dataPath(path, pathToDataset):
path = Path(path)
datasetPath = path/pathToDataset
return path, datasetPath, datasetPath.ls()
path = '../input'
path, datasetPath, subdirectories = dataPath(path, 'waffles-or-icecream') (path, datasetPath, subdirectories)
---------
Output
---------
(PosixPath('../input'), PosixPath('../input/waffles-or-icecream'), [PosixPath('../input/waffles-or-icecream/waffles'), PosixPath('../input/waffles-or-icecream/ice_cream')])

When I see the output of the above code, I know that the sub-directories are “icecream” and “waffles”.

Hey! You Still Didn’t Show What’s Inside?

Okay! Okay! Don’t worry I will show you what I got inside the directories.

Like every self respecting programmer I have created some functions to save me time and make my tasks easier while working with data.

The first function is listifyFileNames(path) . Catchy name eh! This function does the following →

  • Takes in a path as it’s argument.
  • Creates a list of filenames from the subdirectories ‘ice-cream’ and ‘waffles’.
  • Filters out only those filenames which have a valid image file extension. No, funny extensions are allowed.
  • Finally returns the cleaned out list.
def listifyFileNames(subdirectoryPath):
fileNames = [fileName for i in range(len(subdirectoryPath))
for fileName in subdirectoryPath[i].ls()]
validExtensions = ['.jpg', '.jpeg', '.png', '.JPG', '.jpeg', '.PNG'
] validFileNames = list(filter(lambda fileName: fileName.suffix in validExtensions, fileNames))
return validFileNames

The next function is listifyLabels(fileNamesList) . This function checks if the string ‘ice_cream’ is there in the filename and then returns the label as ‘ice-cream’ for that image file. Else it returns the labels as ‘waffles’.

def listifyLabels(fileNamesList):
return ['ice-cream' if '/ice_cream/' in str(fileName) else 'waffles' for fileName in fileNamesList]

Finally I created an ImageDatabunch by using the fastai’s in-built method from_lists() . This method does the following →

  • Takes in list of file paths and the listifyLabels(fileNamesList) function. Uses this function to map the labels to each of the files in the list returned by listifyFileNames(path) .
  • Returns the required databunch.
def createDataBunch(path, filePathList, labelFunc, percentOfDataToSplit, imageSize):
return ImageDataBunch.from_lists(path,
filePathList,
labels = labelFunc,
ds_tfms=get_transforms(), valid_pct=percentOfDataToSplit, size=imageSize).normalize(imagenet_stats)

I have explained about ImageDatabunch in my previous article here.

Next, I collected the databunch in a variable and then used the show_batch() method to display the data batch created by the createDataBunch() function.

fileNamesList = listifyFileNames(subdirectories)
data = createDataBunch(datasetPath, fileNamesList, listifyLabels(fileNamesList),0.2, 224)
data.show_batch(rows=3, figsize=(7,6))

You Don’t Need Truck Load Of Data To Train A Neural Network

Boromir is right! You don’t need truck load of data for every problem related to deep learning.

Our obsession with big data has made us believe that whenever we face a machine learning challenge, we have to throw in some big data plus GPU magic and the problem will solve itself.

Well! that’s not true. Most of the times we can take advantage of the networks which are trained on similar problems and leverage that knowledge in the current problem statement.

This enables us to train a neural network with far less data.

In my problem statement, the number of training samples that I had was as under →

print('ice-cream- {}'.format(len(subdirectories[0].ls())))
print('waffles- {}'.format(len(subdirectories[1].ls())))
--------
Output
--------
ice-cream- 343
waffles- 355

So, you see. I had only 343 ice-cream images and 355 waffles images.

I tackled the shortage of data by using “transfer learning” . WE can understand Transfer learning like this →

Let’s say I have a model which was trained to recognize images of Popeye and Bluto.

Popeye Or Not Popeye

Then I use the weights learned by this model to differentiate between waffles and ice-cream.

Waffle or Not Waffle

Confused! Look at it this way.

The previous model was doing binary classification as it was trying to classify between two classes (Popeye and Bluto).

Our current problem statement i.e. ‘Waffles ’ or ‘Ice-cream’ is also a binary classification problem.

My hypothesis is that the previous model had learned how to differentiate between two classes of images. So, the same weights can be used for binary classification on a sample having images other than Popeye and Bluto.

The only thing that I need to do is to train this model to recognize these new objects i.e. Waffles and Ice-cream. This way I can get away with less training sample and quicker training.

The only way to test my hypothesis was to build a model and train it.

Training Time

To test my hypothesis I used a model which I created in my previous project. The pre-trained model is also available as a kaggle dataset here.

I could have done a better job in reduced the size of the model but then let’s keep it for a later time. For now I am focusing on the current problem statement.

I first have to create the architecture of base model on top of which I will be applying by pre-trained model. Fastai makes it simple to create a typical CNN learner.

learn = cnn_learner(data, models.resnet34, metrics=error_rate)

Next, I did the following →

  • I used the .path attribute of the learn object to set the path to the root of model directory .
  • I used .model_dir attribute of the learn object to set the path to the directory where the pre-trained model was located.
preTrainedModelPath = 'pretrained-model-for-classifying-types-of-trash'learn.path = path
learn.model_dir = preTrainedModelPath

The preTrainedModelPath should be the name of the directory where you would be storing your pretrained model.

Next I load the pre-trained model into the base CNN architecture which I created in the previous step.

learn.load('final')

‘final’ is the name of the “.pth” file i.e. the pre-trained model which was inside the “preTrainedModelPath”.

Now, it’s time to start the training. This can be done by calling the .fit_one_cycle() method.

learn.fit_one_cycle(4)
Training Output Display

It’s cool to see how quickly a network can be trained when transfer learning is used and some GPU magic is sprinkled.

At this point I can see that the error rate is low by my ‘standards’. The thing is that I am not going to judge the performance of this model based on the numbers alone.

Don’t Trust The Numbers

The metrics and numbers give you a sense of measurement, but it can’t tell you if your model is useful or not.

So, I am going to test this model on an image from the validation dataset.

valid_ds gives access to the validation dataset created by the ImageDataBunch class.

image = learn.data.valid_ds[3][0]
image

The predict() is used to predict the label of this image.

learn.predict(image)------
Output
------
(Category ice-cream, tensor(0), tensor([0.6759, 0.3241]))

Hurrah! The network has learned to recognize the image of an ice-cream or so it seems.

Well! I am going to run some more tests to be sure. Before proceeding any further it’s better to save the model.

The function that I created below saves the model to a directory as a “.pth” file.

def saveModel(learnerObject, model_dir, modelName= None, export: bool= True,return_path: bool= True):          learnerObject.model_dir = Path(model_dir)          if export is True:
learnerObject.path = Path(model_dir)
learnerObject.export()
else:
learnerObject.save(modelName, return_path=return_path)
saveModel(learn, "/kaggle/working", 'stage-1', False, False)

Do remember to provide the path where you want to save your model and the name that you would like to give to your model.

Now, there might be some images where the network got confused or it didn’t had much confidence in recognizing which ones were waffles or which were ice-cream .

It might also be that the network likes ice-cream more than waffles and was more biased towards ice-cream.

To check this I have this neat function which uses the plot_top_losses() method and the plot_confusion_matrix() method provided by fastai.

I have explained about these methods in my previous article here.

def plotTopLossesAndConfusionMatrix(learnerObject, data, numOfRows, figureSize: tuple, confMatrixSize: tuple, dpi, plotConfusionMatrix: bool = True):    interp=ClassificationInterpretation.from_learner(learnerObject)    losses,idxs = interp.top_losses()
len(data.valid_ds)==len(losses)==len(idxs)
interp.plot_top_losses(numOfRows, figsize=figureSize)

if plotConfusionMatrix:
interp.plot_confusion_matrix(figsize=confMatrixSize, dpi=dpi)

interp.most_confused(min_val=2)
plotTopLossesAndConfusionMatrix(learn, data, 9, (15,11), (11,11), 60)

My observations from the top losses were as follows →

  • Very few of the images were wrongly classified.
  • Looking at the probability numbers in the “top losses plot” it’s evident that the network was not much confident about the actual labels of the images.

Now, I had this question in my mind. Can I train the network for a few more cycles to improve it’s performance? After all a good couple of rounds of training solves all the issues. Right?

More Training

This time my hypothesis was that if I could find a proper learning rate for the network then may be I can improve it’s performance.

I did the following →

  • Unfreeze() the network and train the network. Unfreezing is the way to open up the head of the network and train some of the inner layers.
  • Plot the learning rate.
  • Find the approximate point where the learning rate plot’s slope started to drop down and the approximate point where the learning rate plot started to rise up.

Remember that the learning rate can’t be too high or else the network would bounce around and could never settle to the zero loss. If the learning rate is too small then you would grow old and your network would never finish training.

So, the sweet spot lies between the two points described in #3 above.

learn.unfreeze()
learn.fit_one_cycle(1)
learn.load('stage-1')
learn.lr_find()
learn.recorder.plot()

Next, I unfreeze() the model to train some more layers of the model using the learning rate range.

learn.unfreeze()
learn.fit_one_cycle(2, max_lr=slice(1e-5,1e-3))

Now, it’s time to face the truth by plotting the top losses.

plotTopLossesAndConfusionMatrix(learn, data, 9, (15,11), (11,11), 60)

Here’s the truth →

  • My hypothesis that more training would greatly improve the network’s performance was wrong.
  • There was a minimal performance improvement but not by a very vast amount. Which is okay if I am satisfied by the performance as long as it solves my problem.
  • More variations of the same data would increase the confidence level of the network. This can be achieved by doing some more transformations apart from the default transformation which fastai does.

Now that I am satisfied by the performance of the model currently I will save it in a secure location. I will use the saveModel() function that I created before and this time I will sue the default value of the “export” flag in this function.

saveModel(learn, "/kaggle/working", False)

This would save the model as a pickle file.

Production Test

I don’t have a production ready server yet for this model. So, what I did was this →

  • Created a different kernel at kaggle.
  • Imported my model over to this new kernel.
  • Downloaded some unlabeled images of ice-creams and waffles from stock photography sites. Making sure that these are not the same images on which my model was trained earlier.
  • Created a new image list with these images. This step is important as it applies the same transformations to these images which was done during training time. Without the same transformations the model would throw error.
  • Used the predict() method to do the prediction.
from pathlib import Path
from fastai.vision import *
def getDatasetPath():
path = Path('../input')
datasetPath = path/'production-test'
return datasetPath
data = ImageList.from_folder(getDatasetPath())image = open_image(data.items[0])
image
model.predict(image)------
Output
------
(Category ice-cream, tensor(0), tensor([9.9981e-01, 1.8518e-04]))

I tried one more image in my sample.

image = open_image(data.items[3])
image
model.predict(image)------
Output
------
(Category ice-cream, tensor(0), tensor([9.9981e-01, 1.8518e-04]))

End Notes

Like I said in the previous sections that this model could be improved further and the accuracy, confidence level and performance of this model could be improved a bit more.

You could take the code for this project and try it out yourself or may be build a better version.

This notebook is available as a kaggle kernel here and also at github.

Announcement

My deep learning course is available at 95% till 31st May midnight. Hurry and use this link to avail the discount.

--

--

satyabrata pal
ML and Automation

A QA engineer by profession, ML enthusiast by interest, Photography enthusiast by passion and Fitness freak by nature