DawgAI: Applying ML to Help Dog Lovers Find their Favorite Breeds
Authors: Abhijit Pujare, Annie Landefeld, Curren Iyer, Juan Pablo Heusser, Nevil George
This article was produced as part of the final project for Harvard’s AC215 Fall 2023 course.
Project Github Repo — HERE
Video — HERE
Table of Contents
- Introduction
- Offline: Data & Training
- Online: Deployment & Inference
- Next Steps
- Acknowledgments
1. Introduction
There are 360 registered dog breeds in the world, according to the Fédération Internationale Cynologique, the largest international federation of kennel clubs. Many breeds can look quite similar, too, making it challenging to identify them all — for instance, compare a Siberian husky vs an Alaskan Malamute
This is significant for two primary use cases:
- Friends of Dog Owners: Abhijit has to buy his friend a birthday present and was thinking of getting something for his dog. However, he forgets what kind of dog his friend has!
- Aspiring Dog Owners: Juan Pablo has decided he wants to get a dog! However, although he’s walked past many adorable dogs in the park he has no way to remember the names of his favorite breeds!
Our goal with DawgAI is to use a machine learning model to help dog lovers and aspiring dog owners identify different breeds of dogs.
2. Offline: Data & Modeling
Getting the Data
For training our model, we decided to use an existing dataset to ensure we captured a large enough sample size of as many breeds as possible. We used the Stanford Dogs dataset, which we downloaded from Kaggle. The data set contains 20,581 images shared across 120 classes, each representing a different breed of dog.
Tensorizing the Data
Given the large data set, we wished to optimize the training process by minimizing the number of network I/O calls being made to load files. We decided to tensorize the data, and store 32 images in each file.
The schema we used to tensorize the images included the image bytes, height, width, channel information and label. This brought down the training time significantly and allowed us to iterate through the dataset at a much faster pace.
Model Training
Our approach to building the model was using a pre-trained image classification CNN, and then training the last few layers to classify dog breeds, in other words, transfer learning.
To train the models we decided to use an 80/20 train-test split. We initially started with three different SOTA models trained on the ImageNet dataset: ResNet152v2, ConNeXtBase and DenseNet201). We achieved the best validation accuracy of 82.5% using the ResNet152v2 architecture.
Afterwards, we decided to compress the model using knowledge distillation. We used the same transfer learning approach, and therefore had to choose a base architecture for the student model. We tried different smaller SOTA models for the student model: ResNet50, ConNextSmall, and DenseNet121, achieving the best performance with the DenseNet121. And so we used a teacher model based on the ResNet152v2 base architecture and a student model that uses the DenseNet121 architecture. Then, based on the contents reviewed in class, we proceeded to implement the distillation training loop and train the student model by distilling from the teacher model. We obtained a 92.6% validation accuracy, even greater than with the teacher model, on epoch 28. Using distillation we managed to compress the teacher model 7.65x and achieve better validation accuracy.
If you’re interested in learning more about the metrics we evaluated to optimize model performance, you can see the full analysis in the Milestone 5 README.
3. Online: Deployment & Inference
Deployment
We used Ansible to automate the provisioning and deployment of our frontend and backend containers to GCP. Below you can find a screenshot of the virtual machine that’s running our service on GCP.
Additionally, you can see the container images we have pushed to the GCP container repository:
The Ansible deployment consisted of a number of yaml scripts that automated different steps of the process:
- Building the docker containers to Google Container Registry
- Creating the GCP instances
- Setting up all the required software on the instances
- Deploying the containers on the GCP instance
- Setting up web server (NGINX)
For the training aspect of deployment, we created a model training pipeline in Vertex AI that has 4 stages: data preprocessing, tensorizing, model training and model deployment. This pipeline makes retraining a model trivial. Each of the stages is its own Docker container, with the Docker images stored in Docker Hub. Here’s a brief description of each stage:
- Data preprocessing: Reads images from GCS, resizes them, and stores them in a separate GCS folder
- Tensorizing: Reads the preprocessed data from GCS, separates it into 32 tensorized shards, and saves it back into another GCS folder
- Model training: Reads the tensorized data and trains the model itself, saving it to Weights & Biases
- Model deployment: Downloads the model from Weights & Biases and uploads it to a Vertex AI endpoint so that it’s callable from the API service (described below)
To further automate deployment, we implemented CI/CD using Github Actions (shown below). Github Actions allow us to trigger the Ansible deployment from above with the inclusion of a simple commit message `/run-deploy-app`. Additionally, we built out more targeted CI/CD steps through a command line interface script that allows us to selectively run and deploy different components of the app using `/run-ml-pipeline` for example. This CI/CD setup enables us to seamlessly keep our application up to date and running smoothly.
B) Frontend & User Experience
We built the web app using the React JavaScript framework. We also used libraries such as Material UI and Roboto font to stylize the design. The user journey is as follows.
- When the user enters the site, they are prompted to choose a photo. The user can also select either the hosted model or the local model and compare latency due to the former requiring an external server call.
- A preview of the image is displayed for the user to confirm that this is the image they want to analyze.
- After that, the user can click the “Upload” button, which prompts a response from the server with the predicted species and confidence level. If the model cannot achieve a reasonable confidence threshold, “Confidence too low to predict” will be shown instead.
- Once the breed has been identified, the user has the option to clear results and start over.
Additionally, the user can also find their last five breed predictions saved at the footer of the page.
Given that we wanted the site to be usable to accommodate common use cases we might encounter while the user is on the go (e.g. they walk past a dog and want to know the species in the moment), we made our site responsive for mobile browser interfaces as well. The mobile app allows for the user to upload photos directly from the camera.
If you’re interested in seeing more of the user experience, please check out the video linked above or the “images” directory of our GitHub repository.
C) API
We wrote an API service to separate out the logic that serves the frontend from the backend logic that calls the model. The “predict” endpoint on the service is called from the frontend with an image to make a model inference:
When the API service starts up, it does two things:
- From a config file, it reads the Vertex AI endpoint to call to use the hosted model for inference
- Downloads the same model from W&B and stores it locally so that it can perform inference without depending on Vertex AI. This is useful in cases where calling Vertex AI introduces high latency
As mentioned above, the model to use for inference is defined by the frontend, which passes a corresponding parameter to the “predict” endpoint. For hosted model inference, the callpath is as follows: Frontend → API service → Vertex AI. For local model inference, the callpath is simply: Frontend → API Service.
D) Load Testing & Deployment Best Practices
In order to ensure that the application could handle different workloads and traffic patterns, we ran a number of different tests:
- We ran a series of tests where we uploaded different sized images onto the website to ensure that both the client side and server side code could handle different inputs.
- We also ran a series of experiments where we had multiple people (> 2) connect to the website at the same time and upload a picture of a dog to test scale. A more robust testing framework would have run a load test where far more connections would have been opened to the server to test it but given the time constraints we were unable to run this test.
- Another “load” test we did was we tested a deployment with 1,2, and 3 kubernetes nodes in the cluster and then ran step 2 to see how response times changed.
Additionally we followed the deployment best practices, outlined below:
- We had a checklist that each of us personally consulted before pushing code to production (e.g. make sure no keys were committed).
- We tried to maintain a culture of having code reviews to ensure code that was pushed was high quality.
- We had a rollback strategy in place to ensure that if a bug was pushed to production there was a commit we could rollback to.
- After anyone pushed code to production we ran a manual test to ensure that the application was still running properly.
- Once we reached some level of stability with our deployment approach, we adopted CI/CD to ensure that latest changes were pushed automatically.
4. Next Steps
We have ideas to improve the current app and also add new features.
Improving the current app:
- Find more datasets, and use them to improve the accuracy of the model.
- Add functionality to “shadow” new models — when a new model is trained, instead of putting it in production immediately, we should send “shadow” requests to it and log results. Then, we can check the comparative performance of the model before promoting it to the production model.
Adding new features:
- Add dog age prediction in addition to the breed predictions in the app — we found a dog age dataset that we could use to train a new model that is integrated into the core flow of the app.
- Display the top five predictions for a given input — use case could either be recommendation (“other dogs you may like…”) for discoverability or correction (“not your dog?”) to help the model improve.
5. Acknowledgments
Team DawgAI would like to thank our instructor Pavlos Protopapas and the Harvard AC215 teaching staff — especially Shivas Jayaram and Jarrod Parks — for their great instruction during the semester and consistent support with this project.
References
Olivia Munson. “How many dog breeds are there? Get to know the number of breeds, groups of dogs in US”. March 2023, USA Today.
Kaiming He et al. “Deep Residual Learning for Image Recognition”. December 2015, Computer Vision and Pattern Recognition.
Aqeel Answar. “Difference between AlexNet, VGGNet, ResNet, and Inception”. June 2019, Towards Data Science.