Oracle Developers
Published in

Oracle Developers

Creating a Mask Model on OCI with YOLOv5: Training and Real-Time Inference

Introduction

Hardware?

  • Shape: VM.GPU3.2
  • GPU: 2 NVIDIA® Tesla® V100 GPUs ready for action.
  • GPU Memory: 32GB
  • CPU: 12 cores
  • CPU Memory: 180GB
Note: this custom image is very useful and often saves me a lot of time. It already has 99% of the things that I need to work on in any Data Science-related project. So, no installation/setup wasted time before getting to work. (It includes things like conda, CUDA, PyTorch, a Jupyter environment, VSCode, PyCharm, git, Docker, the OCI CLI… and much more. Make sure to read the full custom image specs here).

Price Comparison

Training the Model with YOLOv5

git clone https://github.com/ultralytics/yolov5.git
cd /home/$USER/yolov5
pip install -r /home/$USER/yolov5/requirements.txtp

Downloading my Dataset

Note: thanks to RoboFlow and their team, you can even test the model in your browser (uploading your images/videos) or with your webcam!

Training Parameters

  • --device: specifies which CUDA device (or by default, CPU) we want to use. Since I have two GPUs, I’ll want to use both for training. I’ll set this to “0,1”, which will perform distributed training, although not in the most optimal way. (I’ll make an article in the future on how to properly do Distributed Data Parallel with PyTorch).
  • --epochs: the total number of epochs we want to train the model for. If the model doesn’t find an improvement during training. I set this to 3000 epochs, although my model converged very precisely long before the 3000th epoch was done.
    Note: YOLOv5 (and lots of Neural Networks) implement a function called early stopping, which will stop training before the specified number of epochs, if it can’t find a way to improve the mAPs (Mean Average Precision) for any class.
  • --batch: the batch size. I set this to either 16 images per batch, or 32. Setting a lower value (and considering that my dataset already has 10,000 images) is usually a bad practice and can cause instability.
  • --lr: I set the learning rate to 0.01 by default.
  • --img (image size): this parameter was probably the one that gave me the most trouble. I initially thought that all images, if trained with a specific image size, must always follow this size; however, you don’t need to worry about this due to image subsampling and other techniques that are implemented to avoid this issue. This value needs to be the maximum value between the height and width of the pictures, averaged across the dataset.
  • --save_period: specifies how often the model should save a copy of the state. For example, if I set this to 25, it will create a YOLOv5 checkpoint that I can use every 25 trained epochs.

Which YOLOv5 checkpoint to choose from?

Note: you can also start training 100% from scratch, but you should only do this if what you’re trying to detect has never been reproduced before, e.g. astrophotography. The upside of using a checkpoint is that YOLOv5 has already been trained up to a point, with real-world data. So, anything that resembles the real world can easily be trained from a checkpoint, which will help you reduce training time (and therefore expense).
Note: all checkpoints have been trained for 300 epochs with the default settings (find all of them in the official docs). The nano and small version use these hyperparameters, all others use these.

Training

# for yolov5s
python train.py --img 640 --data ./datasets/y5_mask_model_v1/data.yaml --weights yolov5s.pt --name y5_mask_detection --save-period 25 --device 0,1 --batch 16 --epochs 3000

# for yolov5x
python train.py --img 640 --data ./datasets/y5_mask_model_v1/data.yaml --weights yolov5x.pt --name y5_mask_detection --save-period 25 --device 0,1 --batch 16 --epochs 3000
Note: this means that both the incorrect and no mask classes are underrepresented if we compare them to the mask class. An idea for the future is to increase the number of examples for both these classes.

YOLOv5 Inference

# for a youtube video
python detect.py --weights="H:/Downloads/trained_mask_model/weights/best.pt" --source="<YT_URL>" --line-thickness 1 --hide-conf --data="data.yaml"

# for a local video
python detect.py --weights="H:/Downloads/trained_mask_model/weights/best.pt" --source="example_video.mp4" --line-thickness 1 --hide-conf --data="data.yaml"
  • A YouTube video
  • Local MP4 / MKV file
  • Directory containing individual images
  • Screen input (takes screenshots of what you’re seeing)
  • HTTP or Twitch streams (RTMP, RTSP)
  • Webcam

Results!

Conclusions

Acknowledgments

  • AuthorNacho Martinez, Data Science Advocate @ Oracle Developer Relations

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store