How to make the most out of the top-down fast.ai course, Practical Deep Learning For Coders, Part 1
So far, Practical Deep Learning For Coders, Part 1 taught by Jeremy Howard has been a wonderful class with every one of its lectures packed with cutting edge knowledge and best practices of deep learning. Instead of taking the familiar bottom-up approach, where we first learn all the underlying theories, especially the math, and then gradually build up to real-world level application at the very end, the course takes a top-down approach, which allows us to get exposed to cutting-edge applications and performance starting from the first lesson. In my past schooling experience, the research papers are what we look up to in awe and reverence, something beyond our reach. However, here in this class, we break the records set by famous research papers in almost every dataset we get our hands on, often by a wide margin.
I have always been in a state of wonder when running the record-breaking Jupyter notebooks provided by Jeremy and walked through in lectures: how is it possible that I beat so easily, just in a few lines of codes, the results obtained by a team of brilliant professors and PhD students from prestigious universities, who have years more experience than me? Of course, it is not as easy as it looks. Most of the heavy-lifting has been done by the incredible fast.ai library, an opinionated one that incorporates the state-of-the-art research findings and make a default choice for the users when possible, freeing me from all kinds of traps and loopholes in choosing from an endless collection of frameworks, architectures, and optimizers, followed by fine-tuning hyper-parameters, every step of which is instrumental to the success of training a deep learning model.
The Illusion of Competence
Yet, as almost always, with great abstraction comes great ignorance; it creates an illusion of competence that quickly falters in the face of real-world problems. The lesson 1’s notebook is about building a 37-category pet classification model using CNN. Since the legend says that any article with cute pet pictures will attract people to read, I will put one here.
Here is a code snippet example from the notebook:
data = ImageDataBunch.from_name_re(path_img, fnames, pat, ds_tfms=get_transforms(), size=224, bs=bs
For now, please focus on
ds_tfms=get_transforms() and ignore the remaining details. Here,
ds_tfms stands for dataset transformation, more commonly known as data augmentation, which generates copies of the images, each of which is changed in such a way that it looks slightly different visually but still distinctively belong to the category it should belong to. If you feel that it sounds sketchy to you, you are on the point. However, as for lesson 1, this is the level of understanding I reached; I had a rough idea about what it does and assumed it to work. It did, beautifully, on the Oxford pet classification dataset, but got me straight into a shipwreck when later working on one created on my own, a collection of car images labeled according to the angles or views. Here are some examples, each with a different label of angles:
Just a note here, I labeled these 1000-plus images from scratch in hours using the amazing data preparation platform Platform.ai built, again, by our awesome Jeremy Howard. I strongly recommend you to check it out. It is hella cool and free to use for now.
Since it is also an image classification problem, I basically copied and pasted the codes from lesson 1 notebook with little modification. However, the training failed miserably and could not even get to an accuracy above 80%. This is highly unusual as the problem itself is extremely simple, only differentiating the photos on very basic geometric properties. I started walking through the codes, especially focusing on the black boxes I assumed to work before. Eventually I started investigating
get_transforms() and looked into its documentation.
def get_transforms(do_flip:bool=True, flip_vert:bool=False, max_rotate:float=10.0, max_zoom:float=1.1, max_lighting:float=0.2, max_warp:float=0.2, p_affine:float=0.75, p_lighting:float=0.75, xtra_tfms:Optional[Collection[Transform]]=None) → Collection[Transform]
The default argument
do_flip:bool=True looks very suspicious to me. Here are some more documentations:
doflip: if True, a random flip is applied with probability 0.5
flipvert: requires doflip=True. If True, the image can be flipped vertically or rotated of 90 degrees, otherwise only an horizontal flip is applied
There is also a demonstration of the transformation, with cute pet photos of course:
So apparently the default transforms will flip my images horizontally at a 50% chance. It. Is fine for kitties as a kitty flipped horizontally is still a kitty, but for my specific dataset, the label depends on the very horizontal view angle of the car itself. As a result, the image transformations will completely mess up the image labels. No wonder the training does not work. After adding an additional argument to the function call,
get_transforms(do_flip=False), the training went as smooth as usual and I got a merry 96% validation accuracy.
What is the lesson here? As stated at the beginning of this article,
fast.ai is a highly opinionated library with many built-in assumptions, thus able to perform at a very high level on many deep learning datasets and problems right out of the box. For example, the default data augmentation works very well on most image classification dataset. However, for certain datasets that do not meet the assumptions, like that horizontal orientation does not change image labels, it would perform significantly worse. To be able to use the library on more datasets, inevitably we need to look into the black boxes and understand what the library is doing.
The question here is, however, when to look into the black boxes. There is much merit in the top-down approach of learning. If we simply look into every black box we encounter, then the learning experience is no different from a bottom-up one. The strategy I take is to only do so when the black box does not work as expected. In the spirit of the wise old saying “if it ain’t broke, don’t fix it,” I would suggest here that “if an abstraction ain’t broken, don’t look into it.” To use the jargon of machine learning, the process of breaking down the abstraction and fine tuning our understanding of it is just like training the model with more data. At the beginning of the training, when the model has been through only a small set of data, it does not generalize well; when we just started learning and using an abstraction, we overfit our understanding of the black box to the few examples we have seen.
However, just like it does not make sense to train our models to aptly recognize all the different dog and cat species in order to differentiate a dog and a cat, we don’t need to know every tiny detail of the abstraction to start using it for some very general problems. Nevertheless, we do need to give ourselves opportunities to encounter occasions where the rough conception broken down; only then can we learn how to apply it to more general problems. That is why in a top-down approach learning, it is instrumental to apply what you have learned to a slightly different set of problems. If neglected, your understanding of the knowledge works in the messy real world will work as poorly as a model trained with a tiny dataset.
Next time when learning a material top-down, if you get stumbled applying it to a real-world problem, don’t panic, doubt the learning approach, or even outright switch back to the rabbit hole of bottom-up learning. Remind yourself that it is an integral, and, indeed, the most rewarding part of the top-down learning experience, just like solving challenging problems is for learning math. Do your research, read the documentation, understand the source code, or even dig into the mathematics theory behind until you have refined your understanding of the abstraction enough to solve the problem. You will come out with a much better understanding of the concept and that is how you grow. At the same time, unlike in a bottom-up approach, where you have no clue why you are learning all the materials until the very end, you will always have a motivation and sense of purpose throughout the whole learning experience, as every bit of your work is directed to solve a problem at your hand. For me, learning top-down has been a dramatically more satisfying and rewarding experience. I hope that you can enjoy it as well. Happy top-down learning.