Yolov8 on Google Colab: A Step-by-Step Guide to create Custom Dataset

Casca Kwok
6 min readJan 25, 2023

--

Dataset source: UG2+ Challenge

The purpose of this document is to provide a comprehensive guide for the installation of Yolov8 on Google Colab, including useful tips and tricks, intended to serve as a one-stop resource for those looking to set up and utilize Yolov8 on the platform.

Installing Yolov8

Before beginning the installation process, we have to know something about the default file structure of Google Colab first.

Google Colab installs files in the “/content” directory by default, and it is a temporary files placeholder. Upon every restart of run time, such as when a browser is closed, runtime crashes, you would need to install Yolo8 again by default.

It could take quite a while for the reinstallation, and any previous training/ evaluation/testing results, customisation of Yolo8 codes, would be lost.

To circumvent this issue, install Yolov8 in a persistent location, such as a folder in your Google Drive.

Procedures -

  1. Create a folder under “drive/MyDrive/Colab Notebooks”, name it as “Yolo8” for example
  2. Install Yolov8 into this directory
!pip install — target=”/content/drive/MyDrive/Colab Notebooks/Yolo8” ultralytics

Running Yolov8

Once the installation is completed, we have 2 options to run Yolov8 — either by the CLI provided by Ultralytics or by running as a Python script.

  1. By CLI

The “yolo” command runs training/validation/prediction of the object detection model. The parameter “data” provides dataset definition in yaml format, “model” parameter provides pretrained information, the “epochs” parameter specifies how many loops to run a full set of dataset, and there are more paraemters come with for modelling from the official portal.

Option1: Running Yolo8 with CLI. image source: ultralytics

If you choose to use CLI, you may encounter the issue of “yolo” not being found. It is because the file path has to be pointed correctly. To resolve, you can either update your system path to include the yolo8/bin directory, or copy this yolo file into the “Yolo8” directory.

You might also hit permission denied for either option, issue a “chmod 755” to fix the issue.

2. By Python script

Option2: Running Yolo8 with Python. image source: ultralytics

Customize and use your own Dataset

The coco128.yaml in the above example defines how to deal with a dataset. Let’s take a look on how it works.

The file specifies training/validation/testing dataset directory paths, and class labels. The following capture shows the COCO dataset carrying 80 different class labels, where each label depicts an object type shall be detected such as “person” , “car”.

Inside coco128.yaml

To use your own dataset, replace “coco128.yaml” from the CLI/Python script parameters with your own .yaml and definition. The new file shall be located at the Yolo8/ultralytics/yolo/data/datasets directory.

To illustrate, for my own dataset, I only have one class label/object to be detected, so I have modified the .yaml as

path: "/content/drive/MyDrive/Colab Notebooks"  # dataset root dir
train: images/train
val: images/valid

# Classes
names:
0: Vehicle

Another important point to note is about the directory structure of training/validation dataset and labels. At first I modified my directory structure a bit but seems my setup could only work by following this YOLOv5 structure -

Train the network

Putting together, my final Python codes to train and evaluate the network has become -

import os
import sys

os.environ["PYTHONPATH"]="/content/drive/MyDrive/Colab Notebooks/Yolo8-lib"

# main code to run YOLO training
from ultralytics import YOLO

# Load a model
#model = YOLO("yolov8n.yaml") # build a new model from scratch
model = YOLO("yolov8n.pt") # load a pretrained model (recommended for training)

# Use the model
results = model.train(data="{training_path}/Hazy.yaml", epochs=150, pretrained=True, iou=0.5, visualize=True, patience=0) # train the model
results = model.val() # evaluate model performance on the validation set

There are numerous training parameters which could help your setup efficiently.

For example, if it’s the first time you are playing around with your own captured dataset, you could choose a lower iou(Intersection over union) value initially. The purpose is to know if the network has been setting up probably and learning something before we get to the point the network has converged.

The second useful parameter is visualize=True. When we annotate a dataset, there could be chances that the annotation tool supports multiple annotation(bounding box coordinates) format but we didn’t choose the one we desired. To verify if the correct formats are properly fed into the network, turn visualize=True, then observe training samples after training. If annotation format were not properly exported, you would see incorrect bounding boxes coordinates at this stage, they are not bounding the objects as it should have been. This parameter is default on at Yolov5, but at Yolov8 (as of Feb 10 2023), you would need to turn it on specifically.

If you want the network to early stop after n epochs when the training loss doesn’t show improvement, you can configure the patience parameter with your desired number of epochs. Early stop serves couples purpose — 1. avoid overfitting your network. 2. save GPU cycles as I paid for Google Colab.

pretrained=True, iou=0.5, visualize=True, patience=0

Once the training has been completed, you would not only be able to view the results at Yolo8/runs/detect/train, but also plentiful important information output by the Yolov8 framework.

The args.yaml records hyperparameters such as number of epochs, batch size, optimiser by default, which are crucial indicators to observe performance improvement upon different experiment settings.

The event logs records training loss and mAP50, mAP50–90 per step, could be helpful to visualize model performance by logging tools such as TensorBoard by showing the plots of performance.

results.csv serves similar purpose with event logs, but you can build your own plots with these raw information.

The weights directory keeps track of saved network state. This is the path from where the inference load the state and generate testing result.

Information output after training in Yolo8/runs/detect

Validation

Verify the validation results at Yolo8/runs/detect/val. It is important to visually inspect these files to ensure that the objects are being detected correctly and that the bounding boxes and labels are accurate.

Dataset source: UG2+ Challenge

Inference

To run inference, ensure that the yolo file has the correct permissions by making it executable. If the system indicates that the file cannot be executed, you may need to use the chmod command.

#  inference on testing dataset
PATH="runs/detect/train59/"
!yolo task=detect mode=predict model='{PATH}/weights/best.pt' save_txt=True conf=0.25 save_conf=True source='original dataset/test'
Dataset source: UG2+ Challenge

If you are encountering issues in viewing the bounding boxes from these output, such as bounding boxes not showing up, or they show up but misaligning with detected objects(ie, objects are correctly detected, but bounding boxes do not bound the object precisely)check the followings -

  • At validation, consider lowering confidence level or IoU threshold
  • Ensure the save_txt parameter is set to True.
  • Verify if the ground truth bounding boxes are correctly displayed in the training dataset. Visualize if the label text files are formatted correctly according to Yolo format.

--

--