Training YOLOv4-Tiny for faster FPS

Elven Kim
3 min readJul 24, 2023

--

After lots of Tensorflow, I decided to try other Object Detection (OD) models, which are suitable for constrained environment such as for Raspberry Pi such as YOLOv4 Tiny. Let’s do the following steps to ensure a successful run of this model. While there are many examples out there, it is important to document them so we do not forget the steps.

1. Select the dataset suitable for object detection.

This is mine as it is multiclass with different background and lighting. Remember to download as Darknet.

2. Choose the correct model

Either the traditional model or state-of-the-art deep learning (1 or 2 stage) model

I choose YOLOv4 Tiny as I want to test out something fast compared to Tflite which I obtained 2FPS.

3. Get familiar with the steps

a- Train yolov5 model

b- Convert yolov5 (.pt model) into a tensorflow model(.pb file)

c- Convert tensorflow model (.pb model) to tflite model.

d- Download and install Android Studio

e- Build and run your Object detection App.

4. Customise for Android

Change the customclasses as per the dataset.

Put the tflite folder inside the assets folder

5. Make sure the path is correct of DetectorFactory.java

Modify the file path to the correct such as /content/yolov5/android/app/src/main/java/org/tensorflow/lite/examples/detection/tflite/DetectorFactory.java

Use the link below to test.

We are given the Tesla T4 and in the Colab, we need to change the 60 to 75 to minimise CUDA error.

4. Customise the CUDA given

What is CUDA T4?

Compute unified device architecture (CUDA) programming enables you to leverage parallel computing technologies developed by NVIDIA. The CUDA platform and application programming interface (API) are particularly helpful for implementing general purpose computing on graphics processing units (GPU). Unlike OpenCL, CUDA-enabled GPUs are only available from Nvidia.

The T4, which essentially uses the same processor architecture as Nvidia’s RTX cards for consumers, slots in-between the existing Nvidia V100 and P4 GPUs on the Google Cloud Platform. While the V100 is optimized for machine learning, though, the T4 (as its P4 predecessor) is more of a general-purpose GPU that also turns out to be great for training models and inferencing.

4. Download the YOLO4-tiny weights

%cd /content/darknet

!wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.weights

!wget https://github.com/AlexeyAB/darknet/releases/download/darknet_yolo_v4_pre/yolov4-tiny.conv.29

Up to new ObjectDetection challenge? Try this out!

  1. How to detect dynamic target and complex data from moving videos
  2. How to utilise the depth of 3D sensors to the best
  3. Small object detection (less 10% of overall pixels)
  4. How to combine the speed of one-stage (real-time) with the high accuracy of 2-stage?
  5. How to ensure the method used for specific dataset is still valid for other dataset
  6. How to automate annotation process
  7. How to multi-task object detection with segmentation
  8. How to use other sources such as text to help object detection
  9. How to have unbiased dataset

Resources

[1] Tools, techniques, datasets and application areas for object detection in an image: a review, Systematic Review, Jaskirat Kaur, Williamjeet Singh, Multimedia Tools and Applications

[2] https://github.com/AarohiSingla/TFLite-Object-Detection-Android-App-Tutorial-Using-YOLOv5

[3] https://youtu.be/ROn1_O2zEtk

--

--

Elven Kim

I am a researcher in the field of Robotics, Computer Vision and Artificial Intelligence.