Photo by Chinh Le Duc on Unsplash

Building a nutritional co-pilot using LLMs Part 1: Recipe Extraction

Leveraging Mistral 7B Instruct locally for recipe extraction from YouTube videos

Krasimir Bambalov
4 min readApr 28, 2024

--

At the start of this year, I made a resolution to improve my cooking skills because my repertoire was very limited. I started out by watching countless YouTube tutorials and initially tried to manually record every ingredient and step in a Notion database. However, this task quickly became overwhelming. Keeping track of recipes, ingredients, nutritional information, and even orchestrating shopping lists and meal plans soon spiraled into a culinary labyrinth.

As I faced these challenges, I wondered, ‘How would a data scientist handle this? Could models simplify these tasks?’

I needed something that has a broad understanding of human language, but that is also designed to be adaptable and capable of performing very specific tasks. Seeing these capabilities, it became clear that foundational models in AI could be the solution I was looking for. They could automate the documentation of recipes, streamline the management of ingredients, and optimize the tracking of nutritional information, all tailored to my unique culinary needs.

Concept:

My goal was to create a full-fledged solution that would assist me and my partner in cooking, weekly bulk preparation, shopping, dieting, and overall health improvement.

I’ve decided to use open-source models and tools instead of proprietary APIs because it’s important to me to have control over the technology, especially when it involves my health data. By designing this system from the start with these principles in mind, I’m ensuring that it’s secure, customizable, and perfectly suited to our daily needs.

Getting the model ready for inference:

The core model I chose is the 7.24 billion-parameter Mistral Instruct v0.2 7B Q4_K_S., which has been fine-tuned to follow instructions. To minimize the model’s impact on memory, I selected a 4-bit quantized version.

I run it locally using LM Studio, which supports many .gguf models from HuggingFace. LM Studio is beginner-friendly and features a great UI, offering the capability to deploy a local server and access the model via an API. The setup is straightforward, allowing for interaction with the model using an API format that adheres to OpenAI’s ChatCompletions specifications.

For more information on setting up a local LLM server using LM Studio, you can find additional resources here.

Getting the data in:

The first part of my nutritional co-pilot involves populating a knowledge base with recipes. After setting up the local inference server, my next step is to select a few recipes and test the model’s extraction capabilities.

My main source of recipes is Youtube. I am using a Python library called LangChain to take video transcripts and load them into a Python object.

from langchain_community.document_loaders import YoutubeLoader
loader = YoutubeLoader.from_youtube_url("lorem ipsum dolor sit amet")
object = loader.load()

Ok, good! Now I can use the transcript and construct a prompt that will help me extract all the details I need.

Prompt design:

The model’s primary role is to impose structure and extract all the necessary information for shopping, cooking, and logging food intake in my calorie tracking app. The minimum viable product is designed to accurately extract ingredients in a list format, complete with dosage instructions, summarize the preparation steps, and record both the caloric and macro-nutritional content (fat, protein, carbs).

The main requirement is producing a predictable structured output. To ensure that the output data precisely meets the needs of automated systems without manual intervention, I have defined a data schema using Python’s Pydantic library. This schema acts as a blueprint for the output, ensuring that each data object conforms to the specified structure and data types — crucial for facilitating error-free automation.

Here’s a breakdown of how the schema was implemented:

import json
from typing import List, Optional
from pydantic import BaseModel, Field

class Recipe(BaseModel):
name: str
ingredients: List[str]
protein: Optional[float] = None
carbs: Optional[float] = None
fat: Optional[float] = None
calories: Optional[int] = None
preparation_steps: str
preparation_time: Optional[int] = None

To construct an effective prompt, I followed some of the principles for prompting provided in the ‘GPT Prompt Engineering for Developers’ course offered for free by OpenAI and DeepLearning.AI:

  1. To clearly differentiate between different parts of the input, such as the schema, I am utilizing delimiters like “```”.
  2. I am asking the model to deliver a structured output that adheres to the above schema.
  3. I am specifying the steps required to complete the task.
  4. I am listing the conditions that must be met.

It is now time to integrate these elements into the complete prompt. Et voilà!

def construct_recipe_extraction_prompt(transcript: str, schema_json: str) -> str:
prompt = (
"[INST]\n"
"Given a transcript of a cooking video, perform the following steps to organize the information into structured recipes:\n\n"
"1. Identify each recipe mentioned in the transcript.\n"
"2. For each recipe, extract the name, ingredients, protein, carbs, fat, calories, preparation steps, and additional notes.\n"
"3. Please make sure no other details are extracted.\n"
"4. Ensure the output is a valid JSON array of Recipe objects, conforming to the provided JSON Schema.\n\n"
"Conditions:\n"
"- Please provide just the JSON as output! Nothing else!"
"- If details for protein, carbs, or fat are missing, set them to None.\n\n"
"Please provide a JSON array of Recipe objects according to the schema provided.\n"
"[/INST]\n\n"
"```" + transcript + "```\n\n"
"Please ensure the output is valid JSON that conforms to the schema:\n\n"
"```" + schema_json + "```\n"
)
return prompt

The construct_recipe_extraction_prompt function marks the end of the first chapter!

In part 2, I will test and refine the prompt to ensure its effectiveness.

Before you go:

Follow me on LinkedIn.

References:

  1. DeepLearning.AI. (2024, April 15). ChatGPT Prompt Engineering for Developers — DeepLearning.AI. https://www.deeplearning.ai/short-courses/chatgpt-prompt-engineering-for-developers/
  2. LangChain. (n.d.). https://www.langchain.com/
  3. 👾 LM Studio — Discover and run LLMs locally. (n.d.). https://lmstudio.ai/
  4. Pydantic. (n.d.). GitHub — pydantic/pydantic: Data validation using Python type hints. GitHub. https://github.com/pydantic/pydantic
  5. TheBloke/Mistral-7B-Instruct-V0.2-GGUF · Hugging face. (n.d.). https://huggingface.co/TheBloke/Mistral-7B-Instruct-v0.2-GGUF (available under Apache2.0 license)

--

--

Krasimir Bambalov

Data Scientist. Open-source AI Advocate. Passionate music critic, avid collector, and skilled selector. Genuine fan of advertising.