SnapNutrition: Bridging the Gap Between Technology and Dietary Well-being

14 min readDec 13, 2023

This article was produced as part of the final project for Harvard’s AC215 Fall 2023 course.

Project Github Repo — Link

Video — Link

Background and Motivation
Problem Statement and Main Objectives
Project Breakdown
App Walkthrough
Lessons Learned
Future Objectives
Conclusion

Background and Motivation

Monitoring your dietary intake, including what you ate and when you ate it, plays a crucial role in attaining your health and fitness goals. However, the current landscape of tracking nutritional intake is often laborious, involving manual entry or complex estimations using apps like MyFitnessPal. As a real life example, how would you log the macronutrients from a meal out at a restaurant? You could try estimating, but was that 14 grams of french fries or 28 grams? Or, you could skip that meal and lose the insights gained by tracking nutrient intake over time. In order to mitigate these challenges, we created SnapNutrition with the objective of simplifying food logging by taking pictures of your food! By taking a picture of a plate of food, using our ML model enabled smart app, you can receive an immediate and highly accurate estimate of the meals macronutrients including calories, protein, fat, and carbs. No more guessing and no more lost meals! The ultimate goal of SnapNutrition is to revolutionize dietary tracking by eliminating the common pain points associated with food logging.

Problem Statement and Main Objectives

The core challenge addressed by SnapNutrition is to develop an application capable of estimating calories and macronutrients from user-submitted photos of food, utilizing the capabilities of a deep-learning based computer vision model. The overall goal was to create a Minimum Viable Product (MVP) capable of serving multiple users simultaneously without interruption.

To create the MVP, our focus included dataset acquisition and exploration, the creation of atomic containers, and the implementation of versioned data pipelines. Additionally, we emphasized the development of modular training workflows with performance tracking, the establishment of a robust backend API service, and the creation of an intuitive frontend. Ensuring scalability is integral to our approach, leading us to implement efficient, cloud-based deployment strategies for a seamless and responsive user experience.

Project Breakdown

1. Dataset Acquisition and Exploration

To accomplish the goal of estimating calories and macronutrients, we needed to obtain a diverse dataset that covers a spectrum of daily food and cuisine types. We conducted a thorough review of literature, open-source projects, and datasets. One noteworthy resource is the Nutrition5K paper and datasets, describing a Google Research group’s method of identifying calories from images and RGBD (depth) images. These datasets include plates of food from a cafeteria with diverse cuisines Each image includes quantified macronutrient labels for calories, fat, carbohydrates, protein, and total mass.

Example overhead dish image and macronutrients label from Nutrition5k Dataset

2. Atomic Containers and Versioned Data Pipelines

To ensure a modular and efficient project structure, we use atomic containers for each project component, encapsulating individual applications and services. These standalone containers enable independent operation, fostering a scalable and organized architecture. Concurrently, we enforce versioned data pipelines and distributed computing solutions with tools like Luigi, Dask, and DVC. We also incorporate TensorFlow-based data management practices such as TF Data and TF Records. This approach ensures a robust framework for handling data, allowing seamless transformation and version control.

Data Version Control Container
The Data Version Control Container employs the open-source tool DVC for meticulous versioning of datasets stored in our Google Cloud Bucket. This tool enables us to effectively track and version our raw images and their corresponding labels, alongside the generated TFRecords. Designed to operate within a Google Cloud VM, this container reads from our Google Cloud Storage (GCS) bucket, ensuring a streamlined and organized approach to dataset versioning and management.

Successful use of DVC for data versioning control of data labels

Data Labels Processing and Train, Test, Validation Split
The Data Labels Processing and Train, Test, Validation Split container plays a crucial role in our workflow. It takes raw image and label data as input, processing it to save formatted file paths and labels as pickle files directly into our designated GCS Bucket. These pickle files are split into train, test, and validation sets, preparing them for subsequent ingestion by the TFRecords container. Operating within a Google Cloud VM, this container ensures efficient data processing and management, contributing to the overall robustness of our pipeline.

Successful run of the container in the Google VM

Successful csv and train, test, split pickle outputs on the GCS bucket

TFRecords Creation Container
The TFRecords Creation container is tasked with processing the pre-split pickle files for train, test, and validation sets. TFRecords is a binary format used by TensorFlow to efficiently store and handle large volumes of data for machine learning tasks. During this process, the container engages in image preprocessing, including resizing, and utilizes Dask for computing dataset metrics and applying Dask normalizations before saving the generated TFRecords back into our designated Google Cloud Storage Bucket.
These TFRecords are then prepared for consumption, serving the needs of Google Colab notebooks as well as the subsequent Model Training Container and Model Sweeps Container. Operating within a Google Cloud VM, this container ensures a streamlined workflow and reads from our GCS bucket for efficient data handling and transformation.

Successful TFRecords outputs on the GCS bucket corresponding to train, validation, and test splits

3. Modular Training Workflows and Performance Tracking

In this phase of development, the team focuses on developing a computer vision model designed to extract calorie and macronutrient information, eliminating the need for explicit food recognition and labeling. Furthermore, we incorporate elements like experiment tracking with Weights & Biases (W&B), multi-GPU training, and the implementation of serverless training through Vertex AI.

Model Training Container
The Model Training container serves as a comprehensive hub for all aspects of our training process. It encapsulates the necessary code for packaging the training script, executing jobs in Vertex AI, and monitoring model progress through W&B. Offering versatility, the container allows the selection of various complex architectures and transfer learning base models through a configuration YAML file.
The container also includes features like fine-tuning and multi-GPU training options for enhanced flexibility. Leveraging TF Records and TF Data pipelines, the scripts within the container ensure swift data preprocessing. For seamless replication and execution, we provide explicit instructions on container building, training script packaging, and execution in Vertex AI on our GitHub page.

*Successful training run of multiple models with different parameters on Vertex AI*

Models being successfully tracked using Weights & Biases

Model Sweeps Container
Within the Model Sweeps container, we have consolidated the code for packaging our model sweep training script, orchestrating job execution in Vertex AI, and monitoring model progress through W&B. This container introduces a crucial concept known as a “Sweep,” which is analogous to a grid search. It enables the iteration over diverse combinations of hyperparameters for model training, assigning distinct run IDs to each training combo for coherent tracking in Weights & Biases.
Similar to its counterpart, the Model Training container, this container offers the flexibility to choose from a range of complex architectures and transfer learning base models via a configuration YAML file. The inclusion of TF Records and TF Data pipelines in the scripts ensures efficient data preprocessing and streamlines the model sweeping process, providing a systematic approach to hyperparameter tuning and performance tracking.

Screenshot of sweeps running in Vertex AI

Weights & Biases dashboard showing sweeps tracking

Model Evaluation Container
The Model Evaluation container plays a pivotal role in assessing various model candidates to determine the most optimal one. To streamline this evaluation process, it generates an evaluation summary, and uploads it to our GCS bucket. The valuation summary includes model build parameters, performance parameters and other training metadata. The container identifies the best-performing model, downloads it from W&B, and stores it in a special location within the project GCS bucket. This designated best model is pivotal for the backend API, serving as the reference model for deployment. For an in-depth understanding of the model evaluation process, one can refer to the GitHub page for comprehensive details and insights.

3. Backend Implementation

To ensure the application’s responsiveness to multiple user queries and submissions simultaneously, the team implements a scalable backend API service. Upon initialization, it dynamically fetches predictions from the Vertex endpoint or alternatively downloads the optimal model locally. Constructed using FastAPI, this service stands as a key element in facilitating the interaction between the model and the frontend interface. The integration incorporates a robust mechanism within the API service, automating the retrieval of the best model stored in the GCS bucket. Additionally, the flexibility of utilizing a Vertex AI endpoint is provided, potentially reducing inference latency for enhanced performance.

4. Frontend Design

Keeping user experience in mind, we design an intuitive and user-friendly frontend. This aspect is crucial in encouraging consistent engagement with the application. The frontend service is a dynamic environment running React, Next.js, and integrating Google Firebase Auth for seamless signup and login experiences. Within this application, users are empowered to upload images of their food, triggering predictions from our best model. The model enables users to receive accurate estimations of nutritional information based on their uploaded images. This setup establishes an intuitive and user-friendly interface, embodying the intersection of cutting-edge ML technology and user engagement in our SnapNutrition project.

5. Compression, Quantization, Distillation:

In our Google Colab notebooks, we experimented further with compression, distillation, and quantization (smaller model size and faster prediction time are important for production systems). Overall we were able to achieve lower MAE with the distilled student model and also decrease model size with TFLite Quantization. However, we did notice the quantized model ran slightly slower than the distilled student model.

6. Scalable Deployment

In our deployment strategy, we leverage Ansible to create, provision, and deploy both the frontend and backend components of our application on Google Cloud Platform (GCP) in an automated manner. This approach not only ensures efficiency in deployment processes but also addresses real-world scenarios and peak loads, guaranteeing robust performance. The implementation of Ansible as our automated deployment solution enhances configuration management, fosters consistency across diverse environments, and is accompanied by a thoroughly documented deployment plan. Below is evidence of successful deployment, integration with CI/CD pipelines, monitoring practices, and adherence to other deployment best practices.

Ansible CLI output when deploying Docker images

Within our containerized environment, we have a dedicated component focused on model selection and inference. This container’s purpose is multi-faceted: it downloads the best model from Weights and Biases, modifies the model’s signature to preprocess images during inference, uploads the model to the Vertex AI Model Registry, and finally, deploys the model on Vertex AI to establish an endpoint for prediction requests. This comprehensive workflow ensures the seamless integration of our trained models into production, aligning with best practices for model deployment and inference.

Ansible CLI output when provisioning instance

SSH into VM shows three containers running (nginx, api-service, frontend)

App Walkthrough

Once click log-in, you redirect here for Firebase Auth Login

Once you are logged-in, you go to the Calorie Counter tab

You can either click upload or drag files into the dropzone

You will see your results appear below the dropzone as an increasing list

If you go back to the home page, you will see recent uploads and results

You can also click on an image to get a zoomed-in view

Lessons Learned

One lesson we learned during this project is that properly scoping the dataset used for training is extremely important. As an example, the dataset that we decided to use for this project was of excellent quality as the images were carefully captured using a systematic approach with ML training as the intended use. However, the research group that published the dataset not only captured images of food but also depth images and video. This made the dataset extremely large at 180 GB. Since using the entire dataset would have been limiting in terms of storage and model training, we decided to limit the scope to just overhead images. This reduced the size of the dataset from 180 GB to 2.2, an astounding 98.7% decrease in size! By reducing the size of the training dataset, we were able to explore more models, implement data augmentation as well as hyperparameter tuning which we believe lead us to better models.

Another lesson learned is that the proliferation of cloud-based machine learning tools greatly enables the ability to take an idea to a working MVP in a relatively short period of time. The biggest challenge seems to be deciding which tools to use for which tasks; especially when many tools overlap in functionality. For instance, we initially planned to save each model experiment in a dedicated location on GCS to reference later for evaluation. But after we became familiar with Weights & Biases (W&B), we decided it would be redundant (and more costly) when W&B already does this for us. But later on when we were ready to evaluate our models and choose the best one for deployment, we ran into difficulties using W&B’s API to easily locate and download the models. Special attention to intended roles and the interfaces between these tools is crucial for long term maintenance of a product.

During training, one of our concerns was how to efficiently save models, some of which could be in excess of 100MB. We planned to try many model architectures, and we realized storage costs were going to quickly mount. Our initial plan was to simply save the weights, which greatly reduces storage costs. However, we also planned to use custom model classes for ease of programming, and this became an issue when trying to reconstruct models from their weights. We realized our chosen version of TensorFlow did not have an easy way to reconstruct custom model classes, but it was risky to upgrade and possibly break other aspects of the project. In the future it is important to carefully consider how models are transferred from the training service to other services. Although this may simply be a shortcoming of Keras, and using other frameworks like PyTorch negates the issue.

Another important lesson that is critical for cost-constrained projects is paying close attention to cloud training costs. Google’s Vertex AI is wonderful for its ability to quickly leverage hardware resources after familiarization with its API peculiarities. This is especially true for its custom jobs feature which only uses the hardware resources necessary to complete training, thus eliminating the need to spin-up more expensive virtual machines. However, this ease of use allowed us to become overzealous when trying out many different model architectures and hyperparameters (especially in concert with W&B’s sweep feature). One complaint with Vertex AI is that billing costs are not immediately reflected in our GCP billing statement, which unfortunately led us to spend nearly $1000 in a very short time period (luckily they were mostly free credits). It may be wise to start slow when testing different models and hyperparameters to avoid this pitfall.

Another big takeaway came from the frontend implementation of Next.js with NGINX. For our React application, we chose Next.js, which is a javascript framework that works on top of React that provides performance enhancing features such as server-side rendering. Next.js enables starting as a static site or Single-Page Application (SPA), then later optionally upgrading to use features that require a server. When running command “next build”, Next.js generates an HTML file per route. By breaking a strict SPA into individual HTML files, Next.js can avoid loading unnecessary JavaScript code on the client-side, reducing the bundle size and enabling faster page loads. This all worked great when running our frontend container and background container alone, but the tricky part was then figuring out how to route with NGINX web server per class requirements. Many NGINX tutorial configurations expect an index.html to exist which is not the default for a Next.js application. It took a while to realize this was the issue as “403 Access Denied” was the error when opening the page initially (not failure to load non-existent html). To solve this, we configured Next.js to have a static export so it could then be deployed and hosted on any web server (our case NGINX) that can serve HTML/CSS/JS static assets. After running “next build” per this change, Next.js will produce an out folder which contains the HTML/CSS/JS assets for your application and then also the expected index.html by NGINX. Our team was not experienced with web development prior to this project, and thus learned many concepts surrounding server-side and client-side rendering, in addition to learning about React components and web concepts in general.

Future Objectives

1. Multi-Food Recognition Optimization

In the future, the team plans to optimize the model to identify multiple foods in a single image. This builds on the prior class project’s use of CNN and a sliding window for similar purposes.

2. User-Correctable Model Mechanism

An additional optimization involves integrating a mechanism for users to correct the model based on new images and actual nutritional content. This includes the addition of correct food labels for the previously mentioned optimization.

3. Model Drift Measurement and Retraining Mechanism

Developing a mechanism to measure model drift and triggering retraining after a specified threshold is a crucial step in maintaining the model’s accuracy over time.

4. Additional Inputs for Model Enhancement

The team considers exploring methods to capture the volume or depth of food, such as RGBD images, as additional inputs. However, this exploration is weighed against the goal of increasing ease-of-use for the end-user, acknowledging potential risks of compounding model outputs.

5. Comprehensive User Application

The team also aims to develop an entire application where users can register and track their food entries over time. This includes the visualization of trends in macronutrient intake using the JavaScript library D3.

Conclusion

We started this project with a simple question, “How can we make food logging and nutrient tracking better than what is currently available?” We answered this question with a smart app that is able to provide an immediate and accurate estimate of calories and macronutrients from pictures of single foods and even entire plates of food. In addition, always with the end user in mind, we developed a comprehensive data management, model-training, model-evaluation, and model-deployment architecture using atomic Docker containers and Google’s Cloud Platform services so that our users will always have the best computer vision model available for macronutrient estimation. Furthermore, we ensured that our app is ready for the real-world using modern frameworks and by implementing Kubernetes as a scalable solution ensuring that our app stays up even with multiple simultaneous requests. We learned a lot during the process of making this app and we hope that it makes attaining your health and fitness goals that much easier!

References

Thames, Q., Karpur, A., Norris, W., Xia, F., Panait, L., Weyand, T., & Sim, J. (2021). Nutrition5k: Towards Automatic Nutritional Understanding of Generic Food. 2021 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 8899–8907.

Authors: