Custom DataSet in YOLO V8 !

ChengKang Tan
3 min readAug 16, 2023

--

Let’s use a custom Dataset to Training own YOLO model !

First, You can install YOLO V8 Using simple commands.

!pip install ultralytics

You can refer to the link below for more detailed information or various other models.

github :

Prepare the images for training

Before proceeding with the actual training of a custom dataset, let’s start by collecting the dataset ! In this automated world, we are also automatic data collection. Let’s explore how to automate data collection using Python, I’ll leave the method for Python automation in the link below. it will be very useful.

Link

Import libraries

from ultralytics import YOLO
import cv2
import matplotlib.pyplot as plt

import pandas as pd
import numpy as np

There are five models in YOLO V8. with the smallest one on top and the largest one on the bottom, For this exercise, I will using the largest model YOLOv8x.

https://github.com/ultralytics/ultralytics
model = YOLO("yolov8x.pt")
dict_classes = model.model.names

Download the dataset

You can download the dataset from the webiste above. As far as You know, the creator is the author(ME) hahahahaha.

Or

You can use the below command to download the dataset in zip format.

!wget -O hamster_Data.zip https://app.roboflow.com/ds/jbCLWvQOTl?key=pAdnrL5TmX

Using the zipfile library, extracrted all downloaded files from above and decompressed at the specified path.

import zipfile

with zipfile.ZipFile('/content/hamster_Data.zip') as target_file:
target_file.extractall('/content/hamster_Data/')

Let’s see the .yaml file.

!cat /content/hamster_Data/data.yaml
train: ../train/images
val: ../valid/images
test: ../test/images

nc: 1
names: ['hamster']

roboflow:
workspace: -vftqe
project: hamster-recognition
version: 1
license: CC BY 4.0
url: https://universe.roboflow.com/-vftqe/hamster-recognition/dataset/1

The crucial part we need to focus on is the top 5 lines

train: ../train/images
val: ../valid/images
test: ../test/images

nc: 1
names: ['hamster']

The first three lines (train, val, test) should be customized for each individual’s dataset path. The last two lines do not require modification as the goal is to identify only one type of object, a hamster.

Fix the content of .yaml file

import yaml

data = {'train' : '/content/hamster_Data/train/images',
'val' : '/content/hamster_Data/valid/images',
'test' : '/content/hamster_Data/test/images',
'nc': 1,
'names': ['hamster']
}

# overwrite the data to the .yaml file
with open('/content/hamster_Data/hamster_data.yaml', 'w') as f:
yaml.dump(data, f)

# read the content in .yaml file
with open('/content/hamster_Data/hamster_data.yaml', 'r') as f:
hamster_yaml = yaml.safe_load(f)
display(hamster_yaml)
{'nc': 1,
'names': ['hamster'],
'test': '/content/hamster_Data/test/images',
'train': '/content/hamster_Data/train/images',
'val': '/content/hamster_Data/valid/images'}

Training own dataset !

print(len(model.names))
# 80 COCO dataset labels
model.train(data='/content/hamster_Data/hamster_data.yaml', epochs=50, batch=8)

The results are automatically saved in the ‘runs/detect/train’ path.

Or

!scp /content/runs/ "/Destination path/"

Predict the model

results = model.predict(source='/content/hamster_Data/test/images',save = True)
!scp /content/runs/ "/Destination path/"

Enjoy it !

YOLOOOOOOOOOOOOOOOOOOOOOOOOOOOOOO

--

--

ChengKang Tan

NCKU_CSIE 💻Master print(" I want to share and record my knowledge through this website.") 🌌