Neither fish nor fowl? Classify it with the Smart Ingredient Classifier

When Food meets AI: the Smart Recipe Project

Conde Nast Italy
6 min readJul 27, 2020
Mussel soup

Since its release in late 2018, the Bidirectional Encoder Representations from Transformer, aka BERT, has entered the NLP model hall of fame awarding the nomination of the state-of-the-art in various NLP tasks.

Out of metaphors, BERT has positively changed the way to face NLP tasks, solving many challenging problems in the NLP field.

Given its fame, the post does not dive into BERT hidden magic (others have done it brilliantly), but rather it shows how BERT was exploited in the Smart Recipe Project. Here we developed a system able to identify the ingredient taxonomic class. This essentially means: to classify Emmental as cheese, orange as a fruit, peas as a vegetable, and so on for each ingredient in recipes.

The development of this service has involved besides competencies in ML, expertise in designing a robust and consistent model of classification, i.e. a taxonomy.

In the article, you will find:

  • An introduction to BERT
  • An overview of the Smart Recipe Project Taxonomy
  • BERT for ingredient taxonomic classification
  • Food classification applications

That’s so BERT…

The problem

One of the main problems in NLP consists of a lack of training data. That sounds a bit paradoxical due to the amount of textual data available on the Internet. However, the different NLP tasks require task-specific and in many cases domain-specific manual labeled datasets. These requisites dramatically reduce the numbers of training examples, which results to be not enough for training DL-based algorithms.

The solution

The idea is to exploit a large amount of unannotated data for training general-purpose language representation models, a process known as pre-training, and then fine-tuning these models on a smaller task-specific dataset.

Though this technique is not new (see word2vec and GloVE embeddings), we can say, BERT exploits it better. Why? Let’s find it out in five points:

1 — It is built on a Transformer architecture, a powerful state-of-the-art architecture, which applies an attention mechanism to understand relationships between tokens in a sentence. To better understand how the attention mechanism works, look at this example:

The cat is on the mat, it is sleeping

Does the token “it” refer to the cat or the mat? While for humans the answer is obvious, for an algorithm it is not. Thus the Transformer attention mechanism looks at the input and decides step by step which other parts of the sequence are important and therefore deserve attention.

To be specific, BERT transformer consists of only encoders (in general, transformers have encoders to read the input and decoders to produce a prediction), since its goal is only to generate a language representation model.

2 — It is deeply bidirectional since it takes into account the left and right contexts. What’s new? You can say. The novelty is that BERT is bidirectional at the same time, outperforming Bi-LSTM based models, which do not work simultaneously.

3 — BERT is pre-trained on a large corpus of unlabeled text including the entire Wikipedia ( 2,500 million words) and Book Corpus (800 million words). This allows picking up the deeper and intimate understandings of how the language works.

4 — BERT can be fine-tuned for different tasks by adding a few additional output layers. For example, we used only a single layer trained neural network to adapt BERT for our task.

5 — In contrast with previous models, BERT is not trained to predict the next token in a sequence but to perform:

  • Masked Language Modelling: BERT has to predict randomly masked words. To solve the masking problem, the model has to look in both directions using the full context of the sentence. It is here that the bi-directional property plays its role!
  • Next sentence prediction: BERT tries to predict the next sentence in a sequence of sentences.

Before moving to fine-tuning, let’s have a look at the taxonomy we designed to classify the ingredients.

The Smart Recipe Project Taxonomy

Taxonomy is the science and practice of naming, describing, and classifying things or concepts on the base of shared characteristics. In our case, the things to classify are the ingredients.

On the base of classification criteria, there are different types of food taxonomy:

We designed a hybrid taxonomy able to identify the ingredient class in many common diets. As shown in the below figure, our taxonomy shares nodes with the chemical and derivation taxonomy, though its classification is more specific: it includes indeed a total of 48 classes (the light blue nodes).

BERT for ingredient taxonomic classification

For our task (ingredient taxonomic classification), the pre-trained​ BERT models have optimal performance. We chose the ​bert-base-multilingual-cased model and divided the classifier into two modules:

  • A training module. We used Bert For Sequence Classification a basic Bert with a single linear layer at the top for classification. Both the pre-trained model and the untrained layer were trained on our data.
  • An applying module. The applier takes the trained model and uses it to determine the taxonomic class of the ingredient in the recipe. The input of the module is a JSON file containing the ingredients (extracted through NER, see the previous article):

The final output is another JSON file with the ingredients, their taxonomic class, and the classification F1 score.

The Smart Ingredient Classifier: its future applications

Food classification represents a piece of very useful information to exploit in future system implementations. For example, you can assess the compliance of recipes with dietary requirements, food risk, food consumption, and then decide to integrate such information in automatic cooking tools, recommendation engines, healthy applications….

∑†

--

--

Conde Nast Italy

Condé Nast Italia è una multimedia communication company che raggiunge un’audience profilata grazie alle numerose properties omnichannel.