Training a custom YOLOv8 “Where’s Waldo?” object detection model using Vertex AI Workbench

Last year I deployed my first machine learning pet project — a “Where’s Waldo?” object detector. I even wrote an article detailing the lessons learned. It has been 9 months since and I thought I would share some of the technical challenges I faced in my journey and about my recent experience using Vertex AI Workbench when training my latest Finding Waldo model using YOLOv8.

6 min readMay 1, 2023

Data wrangling

Unlike most other object detection models that are trained to detect everyday objects (dogs, birds, cars, bikes, people, etc.), my pet project is a bit niche and the object it needs to find is also very small relative to the overall image dimensions. The training data was also very specific and not something you can find in the wild (i.e. “golden retrievers”) nor are there a good variety of quality, large images. I realized very soon that I had to provide my own training images if I wanted a quality model and so I bought the set of Where’s Waldo? books and took photos with my phone.

Labeling

In my scenario, labeling was a challenge, albeit a fun one as I had to pour a lot of time into finding where Waldo was in each of the images. Labeling was done using a tool called labelImg which produces an XML file with the relevant labeling information (label, image path, bounding box coordinates, etc.), however label data for the tools I used below did not accept said XML format and I had to write my own scripts to parse the XML to generate the label data file in the accepted format based on the tool I was going to use.

Training

I own an Intel-based MacBook Air, so while linear regression models or small image classification models were bearable, I knew that training an object detection model would make my MB Air sound like a plane engine before takeoff. I needed to look towards the cloud.

Vertex AI AutoML

I really enjoyed this product from Google. AutoML really simplifies the entire process from training to deployment. Unfortunate for me, it was a bit too costly to operate for an object detection pet project, but for an organization that can get more use and value out of the deployed prediction endpoint, AutoML can definitely remove a lot of complexity and speed up time to market.

AutoML also was not the right product for me currently as I like to see how the internals work while I am learning something and AutoML abstracts all of that away from me.

Google Colab

Google Colab is a Jupyter notebook-based environment with a free tier offering that includes a GPU/TPU (you can read about all the details and limitations in their FAQ). I came across Google Colab while I was searching for training options and decided to try a couple of their tutorials. I think the free tier is ideal for prototyping. One downside of Google Colab is that you can only mount files from Google Drive, which may impact performance (ideally, you want the training data to be local).

Google Colab was what I ultimately ended up using to build my YOLOv4 model, although I did have to upgrade to the Pro tier for one month to access the larger machine types as I was hitting memory limitations due to the need to use larger image sizes for training data. The training times of about 4 hours were not great, but I was not running an ideal setup to begin with either using darknet YOLOv4, which was written in C++ in an environment meant for Python (but hey, I was experimenting with different setups at the time).

Vertex AI Workbench (YOLOv8)

When I finally tried Workbench, I immediately fell in love with it. The main reason being that Workbench felt so right for me is the terminal as I feel the most comfortable when I am on the command line. Workbench is natively integrated with many Google Cloud services like Cloud Storage (GCS) and BigQuery, just to name a few. I can also copy my training data directly onto the Workbench instance which will improve training times— something I could not do with Google Colab.

In terms of the instance itself, I am not bounded by any plan/tier limits like I encountered with Google Colab and can choose the machine type & GPU and Google will take care of the rest. Do not be shocked by the projected cost estimate as that is for running the Workbench instance for an entire month. You are more likely to use your instance to perform prototyping or training on a small/medium sized dataset like I did and likely will not run it more than a couple of days — a fraction of the projected costs. You can also shut down your instance when you are not using it, which will save you money. Training a YOLOv8 model in Workbench with larger image sizes required only a fraction of the time it took for me to train the YOLOv4 equivalent in Google Colab.

You can find the dataset and code I used to train my model here.

Deployment

I am a fan of serverless architecture and like to use serverless options to host applications and services that I use in my own personal lab environments. Google Cloud’s App Engine and Cloud Run also come with managed SSL certificates with a custom domain option, which was perfect for what I needed.

App Engine

Disclaimer: this solution that I deployed is admittedly over-engineered

I had my heart set on deploying with App Engine, however I realized that was not going to fly as the YOLOv4 model I was deploying was written in C++, which is not supported. I chose to run my backend on Compute Engine and run the web frontend from App Engine which will send requests to the backend VM via Serverless VPC Access. I chose this deployment method because I have always wondered its viability and how it might affect latency and I can honest say I was pretty happy with the outcome. I ran this setup for the 9 month duration that the YOLOv4 model was being served.

findwaldo.thirdpig.io YOLOv4 deployment architecture

Cloud Run (YOLOv8)

A simpler alternative would be to deploy on Cloud Run, which would allow me to package any required dependencies along with the container image. This is what I opted to do when I deployed my updated YOLOv8 model. As you can see from the diagram below, it is much more simpler. I also opted to use a second generation execution environment as it provides a faster CPU and faster network performance with the tradeoff being a slower cold start time.

findwaldo.thirdpig.io YOLOv8 deployment architecture

EDIT 2023–06–10: I have since updated my deployment to use a larger (2vCPU / 4GiB memory) first generation execution environment instead. This is because this project is not accessed very often so I want to take advantage of its scale to zero capabilities to reduce cost. For this I would also want minimize cold start times.

Takeaways

“Majority of the time, before you even have an ML problem, you have a data problem”.

This statement could not be more true. I experienced it myself even though what I had was a small dataset. Each project will will present its own set of challenges whether it has to do with data sensitivity, data sovereignty or something entirely niche and project-specific. Be prepared to spend the bulk of your time wrangling and preparing your data.

There is no shortage of training or deployment options, but it all depends on your purpose and budget. If you are just starting out or working on a small project like myself, I would wholeheartedly endorse Google Colab and Vertex AI Workbench. Google Colab has a free tier which makes it an ideal sandbox environment where one can play around and learn the tools. Vertex AI Workbench provides more power, control and flexibility but there is a cost to running the instances. Finally, there is Vertex AI which is a fast, scalable solution that is ideal for companies who want to go to market faster with their products. Vertex AI will cost more than the two other options I presented, but can also provide a high return on investment.