You Only Look Twice… Again.

Adam Van Etten
Geodesic

--

Announcing the release of YOLTv5

Preface: This blog is part of a series describing the work done at Geodesic Labs.

The YOLO family of deep learning object detection models has enjoyed remarkable longevity, with new versions still under active development seven years after the original paper release. We have actively tracked and leveraged this impressive framework for the YOLT overhead imagery detection models optimized for the enormous image sizes but tiny object sizes present in overhead imagery. A number of previous blogs covered YOLT/YOLT2, SIMRDWN (which includes YOLOv2, YOLOv3 and the TensorFlow object detection API as backends), and the most recent version based upon the original YOLO architecture: YOLTv4. In this post we announce the release a version based upon the popular YOLOv5 framework: YOLTv5.

1. YOLTv5 vs YOLTv4

In the 5+ years since the original version of YOLT [1], we updated YOLT multiple times using YOLO versions based upon the original Darknet framework written in C. There are speed advantages to the Darknet framework, as well as the advantage of using/supporting versions supported by the creator of YOLO. Yet there are times when a PyTorch backend is preferable to a C backend. For example, if one wants to experiment with AWS StudioLab, then the C libraries necessary for Darknet/YOLTv4 cannot currently be installed. So while the performance differences between YOLOv4/YOLOv5 (and hence YOLTv4/YOLTv5) are minimal, the movement from C to PyTorch can be quite impactful.

2. Training

Data preparation for training is the same as in previous YOLT instances, see prep_train.py. Then simply run:

cd yoltv5/
python yolov5/train.py --img 640 --batch 16 --epochs 100 --data yoltv5_train_vehicles_8cat.yaml --weights yolov5l.pt

3. Inference

To run inference, simply edit yoltv5_test_vehicles_8cat.yaml to point to the appropriate data locations, then run the test.sh script:

cd yoltv5
./test.sh ../configs/yoltv5_test_vehicles_8cat.yaml

4. YOLTv5 Examples

Inference is rapid, proceeding at 1000 square kilometers per hour on the NVIDIA T4 GPU available on the recently released (and free) Amazon SageMaker Studio Lab. At this rate the entirety of Manhattan completes in 3 minutes, and over the 4 hour run allotted in StudioLab one could cover in excess of the entirety of Long Island in New York. See below for inference examples using a vehicle model applied to SpaceNet imagery. We will continue to update YOLTv5 in the future, and encourage interested readers to give the codebase a spin. In future blogs, we will detail how one can combine YOLT (v4 or v5) with graph analytics to explore a number of disaster response scenarios.

Inference in Khartoum, Sudan
Inference in Khartoum, Sudan
Inference in San Juan, Puerto Rico
Inference in San Juan, Puerto Rico

--

--