Applying Computer Vision Techniques on a Kaggle dataset

Maria L Rodriguez
Analytics Vidhya
Published in
9 min readAug 5, 2021
* Image courtesy of Unsplash/ Sean Lim

We now have bountiful data resources to feed our hunger for learning. These data come in different forms, packages and delivery methods. We have utilized a variety in previous blogs:

  • A moderate-sized toy dataset from the Fast.ai library here,
  • A large-sized toy dataset from the Fast.ai library here,
  • Exporting an external dataset here,
  • Creating our own dataset through web scraping here.

Getting data from Kaggle was not a breezy download — upload approach as I initially thought it would be. I was able to find the right formula and decided to apply the dataset in deep learning.

For this mini-project, we will showcase:

A. How to use a Kaggle dataset in Colab, and

B. How to use Normalization, Resizing and Test Time Augmentation Computer Vision techniques described in Lesson 07 in the Fast.ai FastBook.

So, open Kaggle, and explore with me!

A. Use a Kaggle Dataset in Colab

  1. Choose a dataset in Kaggle.

a. If you don’t have an account yet, create one. It is free and is processed quickly.

b. Explore the datasets/ competitions that you are interested in. This blog involves computer vision/ images classification for birds.

c. Choose a dataset that has at least 20 images for every class. Otherwise, you might run into errors.

d. I have been able to use .jpg and .png formats for images. If you encounter the format .tfrec, look for a link to the original dataset.

e. Do this before setting-up. If you set-up and get distracted, your notebook run might get disconnected, and you will need to do a re-run.

Note: You can see a list of the Kaggle datasets in Colab, however, initial content check is best done in the Kaggle website.

2. Setting-up.

a. Notebook

You can run a notebook in Kaggle. However, for this run, we will do it in Colab. If you are new to Colab, please see Step1 a-b here.

If you chose a big dataset, I suggest to use GPU and High RAM Run time.

b. Installations and Imports

i. General set-up for the Fast.ai library.

In a Colab notebook, run the following:

!pip install -Uqq fastbook
import fastbook
fastbook.setup_book()
from fastbook import *
#!pip install fastai -U # unhash if this is your first use
import fastai
from fastai.vision.all import *

ii. Specific set-up for using a Kaggle dataset.

  • In the Kaggle page, open your ‘Account’.
  • Create a New API Token.
* This will give you a .json file.
  • In your Colab notebook, install kaggle and upload the API Token/ kaggle json file.
!pip install -q kaggle
from google.colab import files
files.upload()
  • Create the kaggle directory and enable access.
!mkdir ~/.kaggle
!cp kaggle.json ~/.kaggle/
!chmod 600 ~/.kaggle/kaggle.json

3. Gathering your data.

a. Code format for downloading.

  • !kaggle
  • datasets or competition (ie in Kaggle, is your set from the datasets or competition section?)
  • download
  • path

For the path, use the section of the url after the kaggle.com. For example:

The Kaggle dataset used here can be found in the datasets collection and the url is: https://www.kaggle.com/gpiosenka/100-bird-species.

!kaggle datasets download 'gpiosenka/100-bird-species/train'

I specified the train, but it downloaded both the train and validation sets anyway.

b. Getting the individual files from the downloaded zip and removing the zip once done.

!unzip \*zip && rm *.zip

c. Specify the path.

train_path = 'birds_rev2/train'

That’s it! There is no need to create a dataframe or do any other preprocessing.

B. Examples of Advanced Imaging Techniques.

If you would like an introduction or refresher on imaging transformations, please refer to this resource. You can use the same dataset that you have downloaded here, with minor revisions: Start at Step 3 and instead of (path/’images’), use (train_path).

We will look at how normalization, resizing and test size augmentation affects the accuracy of a Learner developed from a non-pre-trained model. For a more detailed description, refer here.

  1. Baseline Model
dblock = DataBlock(
(ImageBlock(), CategoryBlock()),
get_items = get_image_files,
get_y = parent_label,
splitter = RandomSplitter(seed=42),
item_tfms = Resize(460),
batch_tfms = aug_transforms(size=224))
dls = dblock.dataloaders(train_path)
dls.train.show_batch()
dls.train.show_batch(unique=True)

We can appreciate some of the baseline transformations here such as cropping, L-R orientation and light intensities.

model = xresnet50(n_out = dls.c)
learn_base = Learner(dls, model, loss_func = CrossEntropyLossFlat(),
metrics = accuracy)
learn_base.fit_one_cycle(5, 3e-3)
  • Learner is the code class in Fast.ai that assembles the data, model and training.
  • Xresnet is a sequential, non-pre-trained neural network.
  • n_out = dls.c indicate the number of classes or labels.
  • Cross entropy loss is the computation through which the model is able to learn. It is derived from the predicted probability of the classes. Probability values range from 0 to 1. When these values are transformed to cross entropy, probabilities that approach 0 (very poor predictions) become more obvious and are thus penalized more.
  • Cross entropy loss flat is a restructuring of the cross entropy loss to facilitate processing.
  • While computers need the gradient information provided by the loss, human interpretation is better served by the accuracy metric.
learn_base.lr_find()

Our baseline learning rate was 0.003, which is reasonable to keep based on the lr_find.

learn_base.show_results()

Visualizing some baseline results:

The top label is the actual, and the bottom label is the predicted. If you want to see how the labelling can be differentiated (actual vs predicted), refer to Step 6.a here.

A fast web search can give us some comparative images to understand the Learner’s incorrect predictions.

Interpretation: The baseline model which was trained from scratch gave an 82% accuracy after 5 epochs of learning using cross entropy loss and a learning rate of 0.003.

2. Applying Normalization transformation.

dblock = DataBlock(
(ImageBlock, CategoryBlock),
get_items = get_image_files,
get_y = parent_label,
item_tfms = Resize(460),
batch_tfms = [*aug_transforms(size=224),
Normalize.from_stats(*imagenet_stats)]) #
dls_norm = dblock.dataloaders(train_path, bs=64)

Let us take a glimpse at a sample data to see the transformation in the numerical level:

x, y = dls.one_batch()          # baseline
xn, yn = dls_norm.one_batch() # normalized

Both baseline and normalized x’s and y’s have the same shape: 64 items for the batch, 3 channels for the RGB, and 224 x 224 pixel size that we have specified.

print('Non-normalized tensors:', x[0][0][0][:10])
print('Mean:',x.mean(dim = [0,2,3]))
print('Std:', x.std(dim = [0,2,3]))
print('Normalized tensors:', xn[0][0][0][:10])
print('Mean:',xn.mean(dim = [0,2,3]))
print('Std:', xn.std(dim = [0,2,3]))

Normalization enables placing different sets of values in the same scale so that they can be compared. The group mean would be 0 and the standard deviation 1.

With the transformation in the numbers, let us visualize the effects:

model = xresnet50(n_out = dls_norm.c)
learn_norm = Learner(dls_norm, model, loss_func = CrossEntropyLossFlat(), metrics = accuracy)
learn_norm.fit_one_cycle(5, 3e-3)

Only a minimal improvement was effected by normalization, probably because the images are already relatively similar.

Visualizing some results:

Comparing the misclassified birds with some resources from the web show some possible reasons for the confusion, especially the dominant colour and overall shape.

Interpretation: The normalized model which was trained from scratch gave an 83% accuracy after 5 epochs of learning using cross entropy loss and a learning rate of 0.003.

3. Applying Progressive Sizing

Start training with a small sized image.

dblock = DataBlock(
(ImageBlock, CategoryBlock),
get_items = get_image_files,
get_y = parent_label,
item_tfms = Resize(460),
batch_tfms = aug_transforms(size=128)) # start small
dls_128 = dblock.dataloaders(train_path, bs=64)
model = xresnet50(n_out = dls_128.c)
learn_128 = Learner(dls_128, model,
loss_func = CrossEntropyLossFlat(),
metrics = accuracy)
learn_128.fit_one_cycle(2, 3e-3)

Progress to a bigger sized image.

dblock = DataBlock(
(ImageBlock, CategoryBlock),
get_items = get_image_files,
get_y = parent_label,
item_tfms = Resize(460),
batch_tfms = aug_transforms(size=224)) # bigger
dls_224 = dblock.dataloaders(train_path, bs=64)

And apply the new dls to the previously trained Learner.

learn_128.dls = dls_224
learn_128.fit_one_cycle(3, 3e-3)

We can see that initial training with a small size and progressing to a bigger one leads to a slightly better accuracy than baseline (84.1 vs 82.3%). The run for the small sized images was twice as fast ( 2 vs 4 minutes).

Interpretation: The resizing model which was trained from scratch gave an 84% accuracy after 5 epochs of learning using cross entropy loss and a learning rate of 0.003. It also enabled a slightly faster training.

4. Using Test Time Augmentation (TTA).

The validation set images usually undergo centre-cropping. Needless to say, some information are lost with this default technique. TTA addresses this by cropping from multiple areas of the original image.

# using the baseline dblock and dls
model = xresnet50(n_out = dls.c)
learn = Learner(dls, model,
loss_func = CrossEntropyLossFlat(),
metrics = accuracy)
learn.fit_one_cycle(5, 3e-3)preds, targs = learn.tta()
accuracy(preds, targs).item()

The resulting accuracy is 0.8277, from the baseline of 0.8251.

Interpretation: The TTA step provided a 0.2–0.3% accuracy improvement from the baseline model.

5. Combining these advanced transformation techniques.

For our final modelling, we will utilize progressive sizing, from 128 to 224. We will use fine_tune which performs normalization by default. And we will apply TTA to the Learner.

dblock = DataBlock(
(ImageBlock, CategoryBlock),
get_items = get_image_files,
get_y = parent_label,
item_tfms = Resize(460),
batch_tfms = aug_transforms(size=128)) #

dls_128 = dblock.dataloaders(train_path, bs=64)
model = xresnet50(n_out = dls_128.c)
learn_ = Learner(dls_128, model,
loss_func = CrossEntropyLossFlat(),
metrics = accuracy)
learn_.fit_one_cycle(2, 3e-3)
dblock = DataBlock(
(ImageBlock, CategoryBlock),
get_items = get_image_files,
get_y = parent_label,
item_tfms = Resize(460),
batch_tfms = aug_transforms(size=224)) #
dls_224 = dblock.dataloaders(train_path, bs=64)
learn_.dls = dls_224
learn_.fine_tune(3, 3e-3) #
interp = ClassificationInterpretation.from_learner(learn_)
interp.most_confused(min_val = 5)

And looking at the web for comparison, we can claim that most humans will find these images difficult to differentiate as well.

Interpretation: The final model which was trained from scratch and utilized progressive resizing, normalization and TTA gave an 85% accuracy after 5 epochs of learning using cross entropy loss and a learning rate of 0.003. Misclassifications are deemed reasonable for a general bird classification scheme. Academic papers that need to differentiate between species may be benefitted with more epoch runs and using discriminative learning rates.

6. Let’s play!

btn_upload = widgets.FileUpload()
btn_upload
img = PILImage.create(btn_upload.data[-1])
out_pl = widgets.Output()
out_pl.clear_output()
with out_pl: display(img.to_thumb(250))
pred, pred_idx, probs = learn_.predict(img) # rev
lbl_pred = widgets.Label()
lbl_pred.value = f'Prediction: {pred}; Probability: {probs[pred_idx]:.04f}'
btn_run = widgets.Button(description = 'Classify')
btn_run
def on_click_classify(change):
img= PILImage.create(btn_upload.data[-1])
out_pl.clear_output()
with out_pl: display(img.to_thumb(200))
pred, pred_idx, probs = learn_.predict(img) #rev
lbl_pred.value = f'Prediction: {pred}; Probability: {probs[pred_idx]:0.4f}'
btn_run.on_click(on_click_classify)
btn_upload = widgets.FileUpload()
from ipywidgets import *
VBox([widgets.Label('Select the bird you want to identify!'),
btn_upload, btn_run, out_pl, lbl_pred])

Let’s see what Gandalf would be if he was a bird —

Close enough :0)

Summary:

We were able to utilize a Kaggle dataset in a Colab Notebook. We applied advanced Computer Vision transformations using a non-pretrained model and achieved an 85% accuracy rate on the validation set.

I hope you had fun as much as I did!

Maria

Connect with me on LinkedIn: https://www.linkedin.com/in/rodriguez-maria/

or Follow me on Twitter: https://twitter.com/Maria_Rod_Data

--

--

Maria L Rodriguez
Analytics Vidhya

A cardiac surgeon who has decided to focus on Deep Learning. Vancouver, BC, Canada. https://www.linkedin.com/in/rodriguez-maria/