Personal Memo: first.ai Lesson 1

The following text is lesson from first.ai, Deep Learning Course v3 Part1

https://course.fast.ai/index.html

I started to search the way to use GPU cuz it’s pre-requirement to start this course. There’re a lots cloud vender for this purpose. And then, I decided to use FloydHub. That’s the most easiest platform when it comes to study deep learning. What I like the most in Floydhub is that it’s called Heroku for AI thing. If you’re web engineer, it may sound good option to make MVP!


Setup to start it in FloydHub

  1. Go to instructions, https://course.fast.ai/start_floydhub.html
  2. Click the button there, and that’s it.

All of the environment are already set up after clicking, including python, jyupter notebook, fastai itself.

And cool thing is that the setup’s always latest lib version and repository, so you don’t need to do it manually.

Anything you save in FloydHub is persisted in /folyd/home , dataset is in floyd/home/fastai/data


Intro

There’re 3 important things we need to ask computer to learn well

  1. Dataset
  2. Architecture(Model)
  3. Loss

It is called architecture as model. It is protocol to deal with some specific problems. I’m gonna start with Convolutional Neural Network since it’s good at doing vision tasks.

Loss shows us how much well the computer learn. What I need to ask computer to learn more is just adjust some step in order to minimize the loss one by one.

I firstly need to get dataset, apply some architecture to train the model, and then adjust some variables to reduce loss number. That’s it.


Train the computer

Here is the steps that lesson goes, and some might be changed as version differed. So don’t apply your app as is.

// Load the pre-trained data
data = ImageDataBunch.from_name_re(path_img, fnames, path, ds_tfms=get_transoforms(), size=224)
// Normalizing data
data.normalize(imagenet_stats)

ImageDataBunch class from fast.ai library will hold all the data I need an train and val sets.

ds_tfms and size enables dataset to fit into its size, so that it’s can be comparable easily.

And here is the thing. For now, we need to fit all image dataset into one size. If not, we can’t train properly.

The second line above code, data.normalize , is really important because image can vary. Some are bright and other’s are dull.

We smooth those difference out with other image dataset.

After getting pre-trained dataset, we can see that image by data.show_batch . Always seeing the data itself is good practice.

// Train data using resnet34
learn = create_cnn(data, models_resnet34, metrics=error_rate)
// Train Nth times
learn.fit_one_cycle(4)

The first line, create_cnn class takes few params. ImageDataBunch(data), architecture, and metrics. Remember that architecture is model to deal with data. In this case, image is good fit with resnet34.

I honestly don’t know much about this model, but it’s ok as long as it works for me.

34 number indicates the layer which will be used to recognize image data.

And I save trained dataset by learn.save(SOME_NAME)


Reduce loss to make it learn better

Let’s see the results from model.

interp = ClassificationInterpretation.from_learner(learn)

The class ClassificationIntepretation creates confusion matrix and plotting misclassified images.

// Plot top losses
intep.plt_top_losses(9, fig_size(15, 11))
interp.most_confused(min_val=2)

Those code above shows us most confused combination which images looks simillar to each other for computer.

And interesting thing is I also feel confused with those difference too in many combinations.


Fine tuning

Next step is reducing lost.

I train the model after learn = create_cnn . It’s good in some cases, but I can make it better.

What the previous fit_one_cycle doing is add the some extra few layers to resnet34, and train the additional layers.

However this time is training all layers instead to make it better.

WTF layer is.

Don’t ask me. the concept is really easy, though. All we need to know about it to do Convolutional Networks is put less change to low layer, and each change rate should be set 10x times lower that the last.

Those change is called learning rate.

Learning rate is really important concept so I get it into detail more next article.

I’m gonna put code here.

// Train the whole data
learn.unfreeze()
learn.fit_one_cycle(1)

And I can see the result from it, it might indicate worse that before.

In order to see how well it is, I need to see learning rates.


Find learning rates

I think it’s easy to do just by following the code

// Find learning rate
learn.lr_find()
learn.recorder.plot()

I got plot showing learning rates like this below.

X is learning rates and y is loss.

As you can see ,the point 1e-3 is where loss is increasing. We shouldn’t pass learning rates after 1e-3 to model.

So I put the number between 1e-5 and 1e-3 to train the model. Each learning rates of layer should be separated 10x time at least.

I now got to know good learning rates to be passed in the model.

And then, pass those like this learn.fit_one_cycle(2, max_lr=slice(1e-5, 1e-3)) .


Those are what I did so fat in lesson 1.

And next lesson 2 is digging into more detail, and make a bit experiment.