Developing a Taste for Deep Learning

7 min readJul 26, 2021

As I dive deeper into the waters of Deep Learning, I grow accustomed to the new lingo, my eye starts detecting small details, and my fingers are developing the muscle memory for working in Colab.

Technical work is difficult to assimilate if it is explained in an equally technical lingo. Thus, I will express my learning in a simple language that could hopefully help others that will come after me.

This blog will be an illustration of the lessons garnered from the Fast.ai/ Fastbook Chapter 5 Pet Breeds notes, fora and documentation. The codes were run in Colab (and I had to shift to Colab Pro in the middle of it because of the big run time).

My objective was to be able to classify a food, based on its image. Here are the steps:

Import necessary libraries.

2. Download and extract the data.

“Untar” was a word that threw me off when I first encountered it. Now, I understand it as the act of extracting data from a file or folder.

The function ls() will enable you to see the contents within the path that you specified. In other words, it will count and list the folders or files for you.

Note: Here are possible sources you can look into if you want to use a dataset that is more interesting for you than ‘food’:

Visual Geometry Group - University of Oxford

This website uses Google Analytics to help us improve the website content. This requires the use of standard Google…

www.robots.ox.ac.uk

External data

A complete list of datasets that are available by default inside the library are: Main datasets: ADULT_SAMPLE: A small…

docs.fast.ai

3. Define the DataBlock and loaders.

DataBlock is simply the set of instructions that you want to be followed in compiling the data. For analogy: it is the grocery list and the recipe that you will need before you start cooking your model.

“blocks” indicate the type of data that will be included.
“get_items = get_image_files” will allow you to retrieve the files in the path.
“splitter” will give you a train-validation set at a default of 80/20 ratio, the seed will enable you to set the randomization baseline for consistency of results.
“get_y = parent_label”: unfortunately, the documentation defines the item using the same terms. My current understanding is that it has a way of detecting which is the folder that contains the dependent variables, and assigns this as the parent label.
“item_tfms” enables you to cite the changes that you want each image to undergo. The Resize(460) big size enables cropping of the images into a uniform size while retaining as much of the inner regions as possible. This gives allowance for further transformation.
“batch_tfms” enables you to do further transformations as a group action once the individual item_tfms are done.

At this point, let us illustrate about the possible ways that data augmentation can be done through batch_tfms:

a. This will be our baseline sample image:

b. Applying scaling as transformation:

c. Applying mult=2 as a transformation:

d. Applying rotation as a transformation:

e. And applying magnification as transformation:

4. Modifying the DataBlock instructions after determining good options for data augmentation.

Note that batch transformations are applied on the training set only.

5. Visualizing the images loaded:

6. Checking the summary of the loading process.

If there are errors in the code, the summary could help with debugging.

7. Training using a pre-trained model.

An image is essentially a group of numbers arranged in the x- and y- axes, with each number representing a pixel.

Convolutional Neural Network (CNN) can model 2D data like images. ResNet is a deep learning model that has been trained on numerous data. It already contains good information on how to treat images, such as recognizing edges, lines, or empty space. Thus, using some of these established layers would facilitate model-building. It would streamline the processes that are common to image processing and enable the researcher to focus on the layers that are more pertinent to his particular study.

Looking back at the code output error rate, there is still a lot of room for improvement for this model.

8. Exploring the possibility of improving the model by increasing the number of epochs used.

Increasing the epochs enabled a gradual improvement in the error_rates.

9. Explore possibility of improving the model by tuning the Learning Rate.

a. Determine a good lr value

This shows us that it would be good to choose a learning rate smaller than 0.01. At the same time, we do not want very minute steps, otherwise it will take very long to train the model. It is advised to use the learning rate that is 1 order smaller than where the minimum was achieved, or to use a point that has a clear decrease in loss. Thus, we will use 0.001 (ie 1e-3).

b. Retrain the model using the learning rate determined.

The fit_one_cycle function oscillates the lr value between the min and max and with a default lr_max = None. Thus, setting the lr_max is expected to result in a better accuracy. In our case, however, the error rate remained the same when the lr_max was set to 0.001.

c. Determine a new lr with the layers “unfreezed”.

The resnet layers are usually preserved (ie “frozen”) during the initial training because the weights for these have already been optimized.

d. Applying discriminative learning rates.

The first lr_max value in the slice was applied to the first layer, and the second lr_max value to the last layer. The middle layers were processed using values in between the two.

In Section 8, we saw that increasing the number of epochs to 4 had an end error_rate of 0.35. The discriminative approach to the learning rate shown above, also using a total of 4 epochs, reached a better error rate of 0.25.

10. Using fine_tune for comparison to (fit_one_cycle- unfreeze- fit_one_cycle).

fine_tune is an alternate technique whereby training and unfreezing is done in a streamlined manner. It first trains the newest layer (the one relevant to your particular data), with the pre-trained layers frozen for 1 epoch. It then unfreezes the layers and trains for the number of epochs specified.

The two approaches had comparable error_rate results.

11. Utilizing a deeper model.

A deeper model utilizes more refined steps in training a model, at the expense of processing and memory utilization, time, and possibly overfitting. One of the ways to compromise for this resource-hungry model is to slightly decrease the precision of the inputs by converting them to a mixed precision floating point 16.

The deeper model resnet34 provided a slightly better error_rate compared to the shallower resnet 18.

12. Checking our some of the results in prediction.

The images labeled in green show matching prediction and label names. Misclassified images are shown with red labels.

13. Applying the model to an image from the net.

Good one! My objective of classifying a food was fulfilled.

Summary:

We can utilize pre-trained algorithms in training imaging classification models. By utilizing a dataset with a good number of labeled observations, transforming the images, and using a deep pretrained model, we were able to get more than 80% accuracy in predicting food labels in just 4 epochs.

Future Work:

Cleaning of the data has not been done for this run. Preprocessing of the images that include image verification should increase the model accuracy.

Increasing the number of epochs and doing hyperparameter tuning would also contribute to improving the model.

Developing a Taste for Deep Learning

Visual Geometry Group - University of Oxford

This website uses Google Analytics to help us improve the website content. This requires the use of standard Google…

External data

A complete list of datasets that are available by default inside the library are: Main datasets: ADULT_SAMPLE: A small…

Written by Maria L Rodriguez