FoodBERT: A Meta-Data Enrichment Pipeline for Restaurant Menus

Dogu Araci
iFood Engineering
Published in
8 min readJul 18, 2022

Zulkuf Genc, Dogu Tan Araci (Prosus) — Vitor Oliveira, Gabriel Campos (ifood)

A veggie burger with tofu and without coriander…

Looking for such a specific dish in online restaurant menus is usually not a fun activity, especially when you are hungry! When we tried it on leading food delivery platforms in the Netherlands, what we usually got was a long list of burger restaurants.

Example search results for “veggie burger with tofu and without coriander”.

What happens in the background is probably a keyword search and ranking of the matching results based on multiple criteria. When our specific search terms are not mentioned in the item descriptions, the search algorithm misses them.

The same problem can also occur with recommendations. When the information we have about the items is limited to their description and users’ interaction with them, we can make bad and sometimes even annoying recommendations. For example, if the platform doesn’t know which dishes are vegetarian, then there is not an easy way of knowing which users are vegetarian either.

Then why are the menu items not described in more detail?

We would indeed expect a very thorough specification of items for better discoverability and higher customer satisfaction. However, detailing every food item in constantly changing restaurant menus is a very tedious and time-consuming task… unless we automate it with machine learning!

Enriching Item Descriptions with Meta-data

Menu items have titles and descriptions of varying quality. This text, together with other relevant data points, may contain sufficient signals to generate enriching meta-data. For example, if an item has “burger” in the title and “tofu” in the description, we can predict it is a veggie burger.

The accuracy of the predictions in this use case is very critical. Wrong descriptions can upset customers far more than missing ones. Luckily, in the era of language models, we don’t need huge datasets to train highly accurate NLP models. Based on our previous experience with BERT, which you can read about more below, we had the confidence that a few thousand labelled samples could be enough to train a model with the predictive power we needed.

In the first iteration, we aimed to generate the following meta-data points for each appropriate catalog item:

  • Dish type: Information on what is included in an item such as main course, side dish, drink, dessert.
  • Ingredients: Ingredients in dish items.
  • Size/amount: Serving sizes and number or amount information about each entity in an item.
  • Dish group (taxonomy): High-level category of items such as pizza, pasta, burgers.
  • Food tags: Information tags about the speciality of an item such as vegan, vegetarian, red meat, dairy.

Pre-training for Domain Adaptation

We decided to focus on the text and include tabular data when appropriate for training and left the item images to the next iteration. The first task was to further (pre-)train a Portuguese BERT model on internal and external food-related text data. This can help the language models trained on a generic corpus with domain adaptation and might increase the performance in downstream tasks. We curated a dataset by merging around 300 thousand Portuguese recipes we scraped from public websites with over 1 million descriptions of the items in the iFood catalog.

iFood is the largest food delivery company in Latin America backed by Prosus.

We had our food-related Portuguese corpus and a language model trained for generic Portuguese. Next step was to further pre-train the BERT model on this corpus, so that it adapts better to the food domain. At this stage, further pre-training means training for the Masked Language Modelling task of BERT.

Further training of vanilla BERT model on food text for domain adaptation.

Downstream Task Models

The variety of meta-data points to be generated led us to train different models on top of the foodBERT model. We decided the type of the model based on the available data and requirements of the task. You can see the overview of the tasks and training data below.

Overview of the downstream task models on top of foodBERT and training data used.

To demonstrate what each model predicts, let’s use a real example from iFood catalog, and do inference on it.

Our item: Combo ferrari + batata frita + refrigerante. Pão australiano (sem mel), dois suculentos hambúrgueres de 120g cobertos por cheddar fatia, bacon em tiras, peperonni, tomate, alface americana e molho bbq

In English: Combo ferrari (name of the dish) + fries + soda. Australian bread (no honey), two juicy 120g burgers topped with sliced cheddar, bacon strips, pepperoni, tomato, iceberg lettuce and bbq sauce

Dish Type Classifier

Items can consist of multiple sub-items. A dish can be served as a starter or main course depending on its portion size. Without solving such ambiguities around the catalog items, it is difficult for a food delivery company to personalise its offerings.

An item might seem to be expensive, based purely on its price. But if you’re able to tell that it actually includes main course, side dish and dessert in it, maybe it’s not that expensive after all. The platform can choose to promote this item to its more cost-conscious users. This is not possible, without having the information in a structured way.

That is why, we trained a multi-label classifier with labels such as; main course, side dish, includes drink, includes dessert. We fine-tuned the foodBERT model with a few thousand labelled examples — thanks to our in-house labellers.

The final dish type classifier achieved over 90% accuracy on average and proved itself useful for a variety of use cases, where we clearly need to know what an item offers to our users. The model had the biggest confusion in between side dishes and main courses because some dishes can be presented in both forms. As it can get hard even for a human to decide between those two — are chicken wings a main course or a side dish?

For our example above, the model gives the following prediction:

Combo ferrari + fries + soda. Sounds about right!

Dish Tagger

This was yet another multi-label classifier that assigns one or more of these labels: vegan, vegetarian, red meat, poultry, seafood and dairy. We first soft-labeled a few thousand examples based on some keywords (if it has “beef” in it, it includes red meat). Then we trained a first model based on these soft labels. And we had several runs of fixing few thousand of the model’s predictions and training it again.

Here is what it predicts for our burger example:

Content Parser with NER

A lot of rich information about an item may be present in its description. It just needs to be extracted and Named Entity Recognition (NER) models usually do quite good in this task. We identified a couple of relevant data points based on the requirements of our use cases such as ingredients, sauce, preparation methods, size/amount and any food labels.

We used our internal labelers to annotate dish descriptions. We also used the structured data points we parsed from the public recipes. And fine-tuned FoodBERT with a token classification head on top.

Here’s what it predicts for our example:

Nome do prato: Name of the dish, Acompanhamento: Accompanying item, Ingrediente: Ingredients, Tamanho: Size or volume, Molho: Sauce

Taxonomy Classifier

Categorizing the catalog items is essential to any food serving company. We already had a very high performing model in place that can classify each item into appropriate groups. Our curiosity motivated us to try FoodBERT on this task as well. We trained a multiclass classifier using labelled data points in 72 classes.Being curious paid back and we achieved 5% points of improvement in F1-score (weighted average) on top of the existing model, which was based on TF-IDF vectors.

Ingredient Predictor

Our last and hardest task was to predict the ingredients of a dish from its description, even for the cases where they’re not explicitly mentioned. This is hard for several reasons: (1) collecting the ground-truth is virtually impossible — labellers can’t guess with 100% accuracy what’s in the dish and large-scale restaurant cooperation is not practical (2) even if we had the data, it is not easy to predict if a certain ingredient is really present in the dish just from its description. Who knows if the cook put sour cream in the stroganoff or not?

Yet, we still wanted to experiment and see how far we can go with what we had. After all, even if we’d ended up with a predictor that has low precision, it could still have been useful for some use cases.

We used the recipe dataset mentioned above that has dish names, ingredients and instructions for around 300k dishes. We fine-tuned FoodBERT on the dish names to predict the ingredients in them. Obviously, the distribution of this dataset and the iFood catalog is very different, in terms of how the dishes are expressed. iFood data is much irregular with a bunch of brand names and promotion information sprinkled over. To overcome this difference between the data we trained on (public recipes) and the data we do inference on (iFood catalog), we used our NER system described above. On the inference time, we kept only the food-related terms and removed every other term.

How FoodBERT helps with extracting/predicting ingredients. Pizza mussarela calabresa acebolada: Pizza with mozarella, pepperoni and onions.

Final Words

The experience of developing foodBERT showed us that a modest labelling effort, coupled with the power of language models can go a long way in extracting rich information from text. The information extracted by foodBERT about the items, turned out to be great enablers and boosters for many other use cases, some of which were not even initially considered.

Almost all internet companies have some sort of textual data asset, often in abundance. The recent breakthroughs in NLP have made it easier than ever to tap into these and turn them into actionable, structured data. We will continue with our efforts in improving data enrichment pipeline by adding new models and integrating more business needs. Our innovation backlog is full of AI use-case ideas powered by this versatile pipeline. For example, an ifood assistant that can serve as your personal concierge is out of many…

If you have any suggestions or questions, please don’t hesitate to reach us at datascience@prosus.com. In case you want to be a part of this story, iFood data science team always welcomes talented data scientists and engineers!

--

--