GCP AutoML vs. YOLOv5 for Training a Custom Object Detection Model

Matt Wheeler
Slalom Data & AI
Published in
5 min readJul 15, 2021

--

Our client wanted to detect wildlife in highway camera images. We started by casting a large net across different object detection solutions. The timeline was four weeks, so we needed to pivot quickly. The client was already up and running with a modern data architecture supported by a Google Cloud Platform (GCP) backbone, but had not scratched the surface with machine learning. Right off the bat, we knew GCP AutoML was going to be a contender — with minimal organizational data science depth, the code-free nature of AutoML clearly conferred advantages. After exploring various frameworks, the question was whether a solution in the YOLO (“You Only Look Once”) family could out-perform AutoML. This story is meant as a high-level comparison between these two solutions (there are ample tutorials for each platform out there).

YOLOv5

Based on preliminary testing with YOLOv3 (limited to the labels in the COCO dataset), we knew that a more flexible version would be effective for classifying wildlife in highway images. With the option to train a model using custom images, YOLOv5 was a clear contender for our wildlife use case. One thing to note is the t-shirt size of the model — this is the number of parameters in the model (calculated by summing the number elements in each layer). We can test with the Small version, then train a larger model once we are confident with results and want to achieve better precision and recall.

Source: https://github.com/ultralytics/yolov5

Some of the pros and cons for YOLOv5 relating to our use case include the following.

Pros

  • v5 is code-free — all of the training and detection can be performed from the command line; the repo is maintained by Ultralytics, so theoretically, the dependencies are stable
  • Four sizes of the model for failing fast— Small, Medium, Large and XL
  • Ability to configure Hyperparameters for model tuning
  • Supplementary images (from Google OID) are easy to leverage and label files can be converted to the YOLO format using the script hosted here

Cons

  • No immediate integration with cloud platforms
  • The model performance suffers with increased classes (i.e. types of wildlife)
  • Requires PyTorch in Python 3.8, which shouldn’t be an issue unless your system or container has environment limitations

Below are some resulting detections from the YOLOv5 model:

YOLOv5 detection result for ‘Deer’ class
Yolo Model not trained on a ‘Coyote’ class, but detecting this one as a ‘Dog’ isn’t too bad!

YOLO was excellent at generalizing across camera sites using around 200 images per wildlife Class.

GCP AutoML

Google Cloud’s premiere image object detection tool allows for quickly training models using as few as ~100 images per Class. Some of the pros and cons for AutoML relating to our use case include the following.

Pros

  • The ability to easily label your training images using Vertex AI
  • Integration with Cloud Storage and other GCP tools for automation and deployment
  • We did not notice model performance decreasing as we added Classes

Cons

  • Inability to “tune model” other than enabling Early Stopping
  • Potential higher false positive rate
  • Costs incurred for large training jobs or large evaluations
AutoML correctly identifying the ‘deer’ class
AutoML correctly identifying this coyote

The AutoML model generalized quite well to other camera sites, but was certainly more sensitive, showing high false positives (i.e. antler-shaped weeds being classified as “deer”).

YOLOv5 Model Performance

The YOLO model was trained locally (without a GPU) with the following allocation for train/validation/test sets:

  • Per label, roughly 200/50/50

Below is a screenshot of the training results.

YOLO training results

Specifying the “Large” model weights, across 15 epochs and 5 classes (deer, dog, person, cat, horse), the training performance showed:

  • 78% precision
  • 72% recall

When deployed against ground truth images, recall and precision were actually a bit higher (both closer to 90%). One thing to note is that YOLOv5 did not perform well on very dark images — not great for our use case since most wildlife crossings occur at night!

AutoML Model Performance

The AutoML model was trained in GCP with the following allocation for train/validation/test sets:

  • Per label, roughly 170/20/20

There were four classes for training (person, deer, mouse, small mammal) — these were different classes than those used for YOLO training due to data collection nuances and time constraints. The training performance showed:

  • 94% precision
  • 90% recall

Note that when deployed on ground truth images, the recall was accurate, but as mentioned previously, there were quite a few false positives, meaning the precision is lower in practice. AutoML isn’t missing wildlife, but is a bit trigger happy in classifying inanimate objects as potential wildlife.

General Challenges

Minority classes (snakes, lizards, etc. with low occurrences in images) can be difficult to gather appropriate labeled training images. The OIDv4 repository (or later version) is a powerful tool for easily pulling supplementary images with labels for specified classes. However if a minority class is super rare, then these are not likely to exist in the OID class library.

Most folks will want your model to generalize to classifying…. everything!😬 Have early discussions with your stakeholders to relay the concept that your model will only identify object classes that were defined during training.

Concluding Remarks

Both models resulted in similar ground truth precision and recall, so your selection may boil down to architecture (as it did for us). If you want ease of GCP integration with easy Image Labeling, AutoML is the clear winner. However, if you want to get under the hood and tune your model, YOLOv5 may be your answer.

Matt Wheeler is a Data Scientist in Slalom’s Data and Analytics practice.

Slalom is a modern “digital and cloud native” consulting company with a deep appreciation for all that data and analytics can bring a company. Across our offices globally, we help our clients instill a modern culture of data and to learn how to respect the role they play as owners and stewards of it.

References:

--

--