fast.ai literate programming

Hoa Truong
4 min readSep 25, 2019

--

fast.ai is currently under development the new version 2 (Since I have heard about it — May 2018, this is the 2nd time that it is rebuilt from scratch and integrate many super handy advanced techniques). And in this time, almost everything is written in Jupyter Notebook.

Jupyter Notebook is very familiar to Data Scientist because it is handy to experiment things and presentation. When we want to verify a shape of a tensor, or want to see the histogram of our dataset, or plot the image, or want to understand clearly each line of code, Jupyter Notebook is just straight forward. But for myself, after experimenting thing in Jupyter notebook, I usually have to put the code back to the script to use it as a library, or an application. Then when I want to re-experiment thing, I need to comeback to the Notebook, then adjust thing in the script … . Go back and forth like this is really frustrating.

Then, in this new version of fast.ai, Jeremy Howard introduced a new way to develop almost everything just in Jupyter Notebook. It is not just about converting from Notebook to script, but there is really a programming paradigm behind called: literate programming. For more information about literate programming, you can find it here: http://www.literateprogramming.com/

In short, This kind of programming is not just coding for the machine can run successfully, but also for human who can understand. There are many times that I confronted to a very long and complicated piece of code without many explanations, and taking apart this code to experiments was not trivial. Then with literate programming, writing detailed explanation without polluting the source code is much easier for the author and also for the reader who want to dig deeper to the code.

An example in fast.ai below taken from 09_vision_augment.ipynb:

What is a better way to demonstrate an image transformation than an image itself ?

Then, the necessary code will be exported automatically to the script vision/augment.py (because of the #export in its cell)

Now, there is no need to switch between different coding environment anymore, stay with Jupyter Notebook and everything will be created automatically (it even create a HTML documentation)

In the next section, I will try to explain how it works and also use it for a github repo that I am watching https://github.com/danieltan07/dagmm (It is for an unsupervised technique using Autoencoder and Gaussian Mixture Model)

Explanation and Example:

The code is taken from the fastai-dev repo https://github.com/fastai/fastai_dev/tree/master/dev . And the converting part is from the 91_notebook_export.ipynb notebook.

If you have opened an .ipynb notebook file, you can see that it is just a JSON format file, and you might first thinking that the code is saved somewhere inside. And yes it is. If you extract the file as a dictionary, you can find that it has 4 keys: dict_keys([‘cells’, ‘metadata’, ‘nbformat’, ‘nbformat_minor’]) . All your code cells is evidently in ‘cells’ (Actually in nb[‘cells’][index][‘source’])

So in short, with the help of regular expression, fast.ai will search for every cells in the notebook, find that if the ‘cell_type’ is ‘code’ and it has the line ‘# export’ inside, this cell will be exported to the a script where its name is defined in ‘# default_exp notebook.export’ (There are some keywords which following by ‘#’ to indicate what we should do with it. For example ‘#export’ or ‘# default_exp notebook.export’). And at the end of the notebook, (or in a terminal), run notebook2script(all_fs=True) and all the converting stuffs will be done automatically.

So now feel free to experiment your code and when you want the extract this piece of code to your application, or your library, just put ‘# export’ and run notebook2script().

If in a moment you don’t want to open a notebook (you just want to fix a small bug in your code), you can make the change directly in your script then run script2notebook.py, it will be exported back to your notebook.

You can find here the repo that I try to convert in literate programming: https://github.com/dienhoa/dagmm/tree/master

  • First I copy the notebook_core and notebook_export from fastai. (modify a little bit and remove what is unnecessary for me)
  • Then based on the original repo of dagmm, I copy the script to the jupyter notebook, change import name (the code exported is in folder local), and put a ‘#export’ in the cell I want to export
  • Then what next is just experimenting :D

Conclusion:

Honestly, everything in fastai is now self-explainable (with this kind of literate programming). The notebook itself is better than my article here :D. I just want to introduce it to others communities based on my understanding. It is just a very shallow level. There are many many interesting things there. I’m sure that you now you can dig deeper into the code and learn in a super fast way.

--

--