Intro — Starting AI w/ fast.ai

Wayne Polatkan
Jul 25, 2017 · 4 min read

This is a rough recap of my first 3 months learning AI— and taking too long to get a blog running on my WNoxchi.github.io.


I found www.fast.ai in April 2017 and was a bit blown away. An AI course focused on actually getting things done? I was just finishing Yaser Abu-Mostafa’s CS1156x ‘Learning from Data’ on edX, and while a great theoretical course, it did cut down a lot of my enthusiasm for Machine Learning. I guess learning to code in Python while writing Linear Regression models by hand has that effect.

What really got me about Jeremy Howard’s ‘Practical Deep Learning I’ (which I’ll call FAI01/FADL1) was that, over and over again, he’d explain a thing, you’d go do it, and all of a sudden you’re catapulted to the forefront of applied ML. As an example, the course started off with CNNs, Convolutional Neural Networks. They basically work by sliding a filter across the previous layer, looking for a particular pattern. They can also be difficult to train: they take a lot of computation time, and it’s hard to compete w/ a CNN trained on 1.5 million ImageNet images. So why reinvent the wheel? Find a strong-performing model that can be applied with good results to our problem (in our case, the VGG16 model ~ which is essentially several blocks of convolutional layers and some dense/fully-connected layers at the end), chop off its last/output layer, stick a fresh one on, lock the convolutional layers as they are, and retrain the modified network on the problem at hand. This is called finetuning and transfer learning — and the VGG model works particularly well at this. In the initial assignment, doing this for Kaggle’s Dogs vs Cats Redux, this’ll easily get to the top 50% in rankings, top 25% if done well.

Now this is ignoring details such as ensuring proper color-channel order (the Oxford VGG authors used BGR instead of RGB) and normalizing the data by subtracting/dividing the mean & standard deviation of the ImageNet dataset VGG16 was originally trained on… but these are details you generally hear Howard mention, nod your head going “ah..okay..”, and then really learn by seeing it in the code, rewriting it, and rationalizing backwards.

This crazy part about this is, once you do it, you begin to look around and notice who else is basically doing the same thing as you. For example, later in the course you learn about t-SNE and embeddings. t-SNE is basically a way to flatten any high-dimensional matrix into 2D so it can be roughly visualized (to see how parts of data are correlated), and embeddings are vectors containing how one thing (like a single customer) is related to everything else (all the products in a store). Low and behold, while I was taking FADL1, the VP of Data Science at Instacart, Jeremy Stanley, wrote a Medium post, Deep Learning with Emojis (not Math), where he used the same tools and techniques that were being taught in the course to generate shopping recommendations.

I’ve also noticed people using VGG16 in the same way as the course, but in everything from academic papers to tutorials, and other places. One more example are Matthew Zeiler’s visualizations of convolutional features. You see them in the first few lectures of FADL1, but I’ve also seen them in lecture 5 of MIT’s 6.s191 (Intro to Deep Learning), and in a lot of other places.


A take-away from all this is that the frontier is quite close, and there are a lot of low-hanging fruit in AI. I think also that a lot of people exploring this field are limited by 19th-century research methodologies, and others are intimidated into self-selecting out by general lack of confidence (“but everyone else is a PhD” — “who are you to think you can just start learning this stuff” — etc). So my little hypothesis is that, due to the above, a person willing to put in the work, and approach it with a bit of a cowboy-esque engineering attitude, is going to see a lot of rapid success.

Maybe. Maybe not. Gur du way. We’ll see.


Right, so I may/not expand this, but I wanted some kind of intro before I started writing about the technical stuff I’m learning in J.Howard’s Practical Deep Learning II, and R.Thomas’ Computational (Numerical?) Linear Algebra. Also pictures. So on a totally-not ominous note, a mental image/association that always has me excited about AI:

Corrections:

< edited up above in the 2nd paragraph >

VGG isn’t the state of the art for object detection. But it’s great for transfer learning

— @jeremyphoward

Welcome to a place where words matter. On Medium, smart voices and original ideas take center stage - with no ads in sight. Watch
Follow all the topics you care about, and we’ll deliver the best stories for you to your homepage and inbox. Explore
Get unlimited access to the best stories on Medium — and support writers while you’re at it. Just $5/month. Upgrade