AISaturdayLagos: Recap on Week 3
Here’s a recap of what we learnt in week 3.
Agenda
Fast.ai Lesson 2
We virtually attended Fast.ai Lesson 2. Jeremy Howard did a recap of last week’s lecture, after which he talked about how to choose learning rates, he explained what learning rates are and why they’re crucial to how accurate your model will be. Some key takeaway concept are cosine annealing, differential learning rate annealing, data augmentation, test time augmentation and stochastic gradient descent with restarts.
According Jeremy Howard, these are easy steps to train a world-class image classifier using fast.ai framework
- Enable data augmentation, set
precompute=True
- Use learning rate finder
lr_find()
to find the highest learning rate where loss is clearly improving - Train last layer from pre-computed activations for 1–2 epochs
- Train last layer with data augmentation
i.e precompute=False
for 2–3 epochs withcycle_len=1
- Unfreeze all layers
- Set earlier layers to 3x-10x lower learning rate than the next higher layer
- Use
lr_find()
again - Train full network with
cycle_mult=2
till over-fitting
Lecture 2 — Loss Functions and Optimization
Justin Johnson did a brief overview of last week’s lecture on linear classifiers where he explained how for every pixel in the input image, there is some matrix weight W telling us how much does the pixel influence the labelled class. He talked about how to determine a matrix W that we can apply to our input to give us a loss of approximately zero. He explained two different loss functions Multi-class SVM loss
and cross-entropy Softmax loss
Going down this line of thought, He further explained why these functions aren’t enough to determine how accurate our model is because they tend to generalize on just the training dataset thereby performing poorly on the test sets. This is where a regularization
term comes to rescue.
The key takeaway concepts are SVM loss,
Softmax loss,
L2 Regularization
and Stochastic Gradient Descents.
He wrapped up the class with a discussion on what feature extraction is, why it should be considered and popular methods to perform this.
STAT385 Lecture 2 Readings discussion — Harmonic Analysis of CNNs
The class started with us watching a video of an interview session with Prof. Helmut Bőlcskei where he talked about why it’s important to understand the theoretical aspects of deep learning and it’s underlying mathematics.
Azeez Oluwafemi led the discussion part of this session where he briefly went through the papers on harmonic analysis of CNNs. He explained the findings from the papers which has to do with Translational Invariance. He talked about how a good CNN has to be translationally invariant (meaning that you can recognize an object as an object, even when its appearance varies in some way). He explained that this Translational Invariance can be achieved because of the pooling layer which helps us reduce the dimensionality of our input layer, which in turn reduces compute power. He gave a side note that the translational invariance discussed is vertical and that the early layers of a CNN are still very covariant. Invariance increases with depth of the network.
We concluded this session by grouping the class for projects — into 10 per group.
AISaturdayLagos wouldn’t have happened without my fellow ambassador Azeez Oluwafemi, our Partners FB Dev Circle Lagos, Vesper.ng and Intel.
A big Thanks to Nurture.AI for this amazing opportunity.
Also read how AI Saturdays is Bringing the World Together with AI
See you next week 😎.