Fastai | How to start ?

Why this question ?

The question “How to start with Fastai ?” may seem incongruous.

Just watch the first video, right ? No.

I have a double experience with Fastai. I was first a member of its International Fellowship program (parts 1 and 2) as a student from October 2017. Then, with other colleagues, I launched in 2018 the first study group of Deep Learning in Brasilia using the Fastai course (part 1, then today part 2 as well as the one about ML). So I also became an instructor using the Fastai content.

It is on the basis of this double experience with Fastai that I publish today this start-up guide for the new participants of our course in Brasilia as well as for all those who wish to begin their trip to Artificial Intelligence (AI) by using the Fastai library.

Machine Learning in a few words

The day when the first baby with Artificial Intelligence will come into the world is not for tomorrow. For the moment, every AI needs to be created by man… and we need code for that !

Outside robot and genetic manipulation, the AI ​​takes the form of an algorithm that must be trained to learn (often a model of artificial neural networks).

In practice, the parameters of the algorithm (also called weights) initially have random values ​​that are updated using observations (also called examples) that are provided to the algorithm. This method is called “learning from a dataset” or Machine Learning (the methods of Deep Learning today very popular and widely used are Machine Learning with a great depth of calculations).

Indeed, each observation provided to the algorithm allows it to calculate a predictive result (often a probability) of the nature of this observation, via mathematical operations performed with its parameters (for example, if the observation is an image of a cat, the predictive result must indicate the class corresponding to a cat). The error with respect to the true value of the observation then allows an update of the values ​​of the parameters (often by the use of the method of BackPropagation of the gradient of the error).

The training will then continue with a new observation and so on.

Libraries to implement Machine Learning

The Machine Learning (ML) is thus to have data, computational capacity … and algorithms. We must therefore use a language for coding in order to implement these algorithms, train them, test them and then use them in production.

Since the beginning of the 2010s, the date that marks the beginning of the large-scale use of AI, Python language has become the language for the development of ML and Deep Learning (DL) algorithms in Jupyter notebooks.

Since the architectures of these algorithms are about to be standardized, libraries have been developed to facilitate their use as TensorFlow (Google) with Keras, then PyTorch (Facebook) with Fastai.

Fastai, more than a library

Fastai is both an implementation library of ML and DL algorithms and also the title of a course that started at the Data Institute of the University of San Francisco and is now available online (1 course about ML and 2 courses course about DL).

But its creators Jeremy HOWARD and Rachel THOMAS went further. It is also a new top-down learning method that allows learning by doing, as well a community of more than 10,000 people today (read “Launching fast.ai” from Jeremy Howard, october 2016).

Each course has a freely downloadable video, forum thread and jupyter notebooks running through the fastai library.

How to start with Fastai ? The guide in 4 steps

I saw too many participants in our course in Brasilia either to stop or to be unable to really take avantage of the course because of 4 main reasons: python, Jupyter notebooks, GPU and homework.

1) Python

Python is the programming language used in the Fastai course and its notebooks. If it is not essential to be a python specialist to follow the Fastai course, it is necessary to have a minimal practice.

Online courses:

The 2 following Python libraries are very used in the Fastai notebooks but you can learn them during the course (they are not prerequisites).

NumPy

NumPy is the fundamental package for scientific computing with Python. It allows matematical operations on arrays, matrices, vectors and high dimensional tensors as if they are Python variables.

Online courses : Numpy Tutorial and Python Numpy Tutorial.

Pandas

Pandas is an open source, BSD-licensed library providing high-performance, easy-to-use data structures (ex: csv files) and data analysis tools for the Python programming language. Pandas works very well with NumPy.

Online courses : list of tutorials about pandas.

2) Jupyter notebook

Project Jupyter exists to develop open-source software, open-standards, and services for interactive computing across dozens of programming languages and in particular Python. You will use a Jupyter notebook to implement all your ML or DL algorithms.

Online courses :

3) GPU

Once you know at least Python and know how to use a Jupyter notebook, you need to install the Fastai library and its notebooks on a server with GPU.

Note : if you do not have a local NVIDIA GPU and if you do not want to use one online, you can install Fastai on your computer and use only your CPU but then, it can take a while to get the training result of your ML/DL model…

Why a GPU ? You need it when training a ML or DL algorithm to reduce the training time. Without a GPU, you will not be able to train “easily” an ML or DL algorithm with millions of data.

Apart from the local CPU instalation, you have 2 possibilities : either configure your local GPU if your computer has one such as an NVIDIA GPU, or you can rent one online using Google Cloud, Google Colab, PaperSpace, AWS or others.

Online guides : read the “GPU (Graphic Processing Units)” paragraph of the article “Deep Learning Brasília — Revisão” ou links seguintes.

CPU local

Read the document README.md but follow the steps below:

  1. install Anaconda for Windows
  2. Opens the “Anaconda Prompt” terminal (which was installed by Anaconda) and digit the following commands in this terminal.
  3. mkdir fastai (to create folder fastai)
  4. cd fastai (to enter the folder fastai)
  5. git clone https://github.com/fastai/fastai.git (to download the files Fastai including notebooks and files to install the fastai-cpu virtual environment: pytorch, numpy libraries, pandas, bcolz, etc.)
  6. conda env update -f environment-cpu.yml (IMPORTANT: use the environment-cpu.yml file because you want to use your CPU, not a GPU)
  7. conda activate fastai-cpu (to activate the virtual environment fastai-cpu)
  8. cd courses\ml1 (enter in the ml1 folder for example)
  9. del fastai (delete the symlink fastai that was created to run in the bash environment)
  10. mklink / d fastai ..\..\ fastai (create the symlink windows fastai to the folder fastai that has the files of the library Fastai)
  11. cd ..\.. (exit the ml1 folder to return to the folder root created in step 3)
  12. jupyter notebook (launch the jupyter notebook that will open up in a web browser)

“Et voilà”: you have the Fastai library (and its notebooks) installed on your computer with CPU and you can run all notebooks of the ml1 folder.

GPU local

GPU online

Don’t forget to switch off your virtual machine !!!

4) Homework

Jeremy Howard talks about 10 hours of personal work for each video of his class … and he is right !

Indeed, if you want to learn how to do ML and DL and not just understand the principles, you must PRACTICE.

The elements presented above (in summary, knowing the python language and how to use a Jupyter notebook, and to have installed the Fastai library on a GPU) are NECESSARY prerequisites but not sufficient ones.

To really learn, you have to watch several times videos, run the Fastai notebooks, study the lines of codes, ask questions in the Fastai forum when you do not understand, answer the questions asked by others and publish articles to improve your understanding. This is real learning !

One more word : ENJOY ! :-)