Buon Appetito! A fast.ai spin on Italian food

Francesco Gianferrari Pini
Quantyca
Published in
5 min readOct 26, 2018

As a wannabe Deep Learning practitioner in Quantyca I am attending with some colleagues the third edition of the fast.ai MOOC.

As usual, the first lesson focused on image classification (the domain was about dog and cat breeds) and the assigned task was to apply the procedures of lesson 1 to a new domain.

Therefore we decided to tackle the infamous HotDog/NotHotDog problem on the italian food domain.

Moreover, we found more interesting to provide a categorization that went beyond the Carbonara/NotCarbonara problem, identifying different foods.

Chef’s (executive) summary

The performance was incredible out of the box, just applying a retrain of the last layer.

I think the confusion matrix is the best way to show that:

Confusion Matrix for a resnet50

Appetizer: the data preparation

In order to generate our datasets we relied on the simple yet effective library ai_utilities which mainly provide two python scripts:

  • image_download.py: which invokes either Google Images or Bing Images and download images corresponding to a particular query (like “lasagne” or “spaghetti alla carbonara”) and save them in a folder named like the query
  • make_train_valid.py: which organizes the data in training, validation and test set dedicated folder, using a standard structure like this one (the script can be invoked passing parameters in order to define the size of each set, but we left it on the default settings):

Therefore the fastest (and laziest) way to make your own dataset is to git clone the library and then create a shell script like this one:

Nothing fancy here!

One suggestion: review the content of the folders, sometimes some crap image is downloaded.

First course: loading data in fast.ai

Once the dataset is ready, the process is straightforward and follows exactly lesson 1 notebook.

First of all, the usual configuration and imports:

Then let’s set up the path to the dataset and create a DataBunch:

Even here, everything is super standard. ai_utilities create a folder structure that is totally compatible with ImageDataBunch.from_folder.

Please note that the size is 299 (in the trainings it appears that this should be the right size for a resnet50, but we need to go deeper on that).

Let’s see if data has been loaded correctly:

Yum!

Things look good and yummy!

Second Course: The training

Training is also super standard:

Let’s see the fit epochs results:

We can see that between epoch 4 and 5, the loss on the training set drops to .20, although the valid loss is at .25. In our opinion it starts overfitting at that point. We kept the model in that state and moved onto the dessert.

Dessert: Validation

Let’s see the validation steps and results:

The confusion matrix gives a glance of the actual accuracy of the model (the bluer the diagonal, the better the models, as more images of the validation set are predicted correctly), however the real insights come from:

Risottos can get confused one each other, as the distinction between them is from the palette and from some elements in the rice (like mushrooms or seafood).
Tagliatelle al ragù and Bucatini all’amatriciana can share the appearance (spaghetti-like and red-brownish).

In any case the result is impressive for just an out of the box implementation in such few lines of code.

One last tip (kudos for that to our forum pals Devforfu and EvanXiong, and of course to Sgugger for the final suggestion). To get the prediction from a custom image, just run:

This thing is going to be extremely useful later on when we will need to serve the model via an API.

Caffè and Ammazzacaffè

Some final thoughts:

  • We are really standing on the shoulder of giants: what seems so easy to implement, sits on layers of great code and effort: fast.ai, pytorch, cuda etc, and of course is based on tons on research on architectures.
  • We are anyway amazed by the power of transfer learning. More on that soon
  • We had an afterthought about using Bing or Google Images to create datasets. Maybe Microsoft and Google use deep learning to select the images to show in their service, so maybe our model works so well because is trained and validated on “deep learning friendly” images. We will need to create our own pictures of food for testing, or using the pictures on the menus of crappy restaurant.
  • Last, but not least, kudos to Paperspace. Their platform totally lifted us from any infrastructural effort letting me concentrate on data and modelling.

--

--