Deci’s YOLO-NAS: Next-Generation Model for Object Detection.

Rohini Vaidya
CodeX
Published in
8 min readJun 4, 2023

YOLO-NAS revolutionizes object detection with fast and accurate real-time detection capabilities suitable for production.

YOLO (You Only Look Once) is a family of computer vision models that has gained significant fanfare since Joseph Redmon, Santosh Divvala, Ross Girshick, and Ali Farhadi introduced the novel architecture in 2016 at CVPR.

Story of YOLO

Let me tell you a story about YOLO (You Only Look Once), an incredible object detection model in computer vision.

It all began when Joseph Redmon created the original YOLO in 2016. He used a custom framework called Darknet, known for its flexibility and powerful real-time object detection capabilities. YOLO was groundbreaking because it combined the tasks of drawing bounding boxes and identifying object classes into a single, end-to-end differentiable network. Prior to YOLO, many object detection models followed a two-stage approach. First, they identified regions of interest where objects might be present, and then they classified those regions. However, YOLO changed the game by treating object detection as a single regression problem, making it more efficient and faster. Joseph Redmon continued to improve YOLO and released YOLOv2 and YOLOv3. These versions further enhanced the accuracy and performance of the model. But the story of YOLO didn’t end with Joseph Redmon. New authors joined the journey and introduced YOLOv4, YOLOv5, YOLOv6, YOLOv7, YOLOv8, and YOLO-NAS. These subsequent versions had different goals, reflecting the visions of their respective authors. This single-pass approach has inspired other models like Faster RCNN, which also achieved impressive results.

So, that’s the story of YOLO — a pioneering model that transformed object detection and continues to inspire advancements in computer vision.

YOLO-NAS

In this discourse, I am going to discuss the YOLO-NAS model that has been developed recently.

In the previous YOLO versions, human specialists designed neural network structures manually, relying on their expertise and intuition. Yet, this method — requiring an exploration of immense design spaces featuring numerous possible architectures — remains excessively heavy and time-consuming.

YOLO-NAS is a novel foundation model developed by Deci-AI. It is a game-changer in the world of object detection by providing the best trade-off between accuracy and latency.

Deci-AI uses AutoNAC technology which is an optimization engine developed by Deci. AutoNAC applies Neural Architecture Search (NAS) to refine the architecture of an already trained model. It does this to improve the model’s performance when run on specific hardware, while still keeping its original accuracy as a baseline. By doing this, Deci-AI can maximize the use of hardware and make their Deep Learning Acceleration Platform even better.

Deci’s AutoNAC engine Hardware aware Neural Architecture Search for DL inference efficiency

You can utilize the AutoNAC™ engine to enter a task, data attributes (without requiring data access), inference setting, and performance objectives. This engine will then lead you to identify the most suitable architecture that achieves a fine balance between precision and speed of inference for your unique application. Besides acknowledging data and hardware, the AutoNAC engine also takes into account other components in the inference stack, including quantization and compilers.

In AI research, deep learning models have become increasingly complex, leading to a rise in applications. However, running these models on cloud platforms requires substantial computing power, resulting in high expenses for developers. Hence, AI developers are faced with the task of reducing model size while maintaining accuracy.

YOLO-NAS also includes the quantization aware blocks and selective quantization for optimized performance. Here, quantization refers to the conversion of a neural network’s weights, biases, and activations from floating points to integer values (INT8), thus making the model more efficient. There is a smaller precision drop when converted to its INT8 quantized version.

Quantization-aware training is a method that allows developers to apply quantization without sacrificing accuracy. It is done in the model training process, the model size is typically gets reduced by two to four times or even more. Post-training quantization is a method where quantization is applied to the model after it has completed its training.

Super gradients support selective and partial quantization i.e. skipping modules from quantization or replacing them with quantization-friendly counterparts.

The application of these methods results in groundbreaking designs that excel in identifying objects and delivering exceptional performance.

Training Details

YOLO-NAS’s multi-phase training process involves pre-training on Object365, COCO Pseudo-Labeled data, Roboflow100, Knowledge Distillation (KD), and Distribution Focal Loss (DFL).

Pre-training uses knowledge distillation to learn from predictions and improve performance. A teacher model generates predictions that serve as soft targets for the student model, which tries to match it while adjusting for the original labeled data. This reduces overfitting and improves accuracy, useful when labeled data is limited. The incorporation of distribution focal loss (DFL) further enhances the training process by addressing class imbalance and increasing detection accuracy for underrepresented classes.

Performance of YOLO-NAS

The YOLO-NAS update offers State Of The Art performance, surpassing other models like YOLOv5, YOLOv6, YOLOv7, and YOLOv8 with its unbeatable accuracy-speed combination.

As we can observe from the graph below, all versions of the YOLO-NAS i.e. small, medium, and large with quantization and without quantization achieves good accuracy. Also, the map value is increased compared to the previous SOTA model i.e. YOLOv8

Source: Deci-AI YOLO-NAS

There are three different versions of YOLO-NAS have been released. That is small, medium, and large with and without quantization. It is obvious that there is a slight drop in map value when quantization is applied.

Source: Deci-AI YOLO-NAS

Implementation of YOLO-NAS

Step 1: To try out YOLO-NAS we first need to install the super-gradients library which is a Deci’s Pytorch-based computer vision library.

pip install super-gradients

Step 2: Install all the required libraries.

pip install imutils
pip install roboflow
pip install pytube --upgrade
pip install --upgrade pillow

Step 3: Import the super-gradients library and then get any version of the model i.e. you can choose yolo_nas_l, yolo_nas_m or yolo_nas_s.

from super_gradients.training import models
yolo_nas_s = models.get("yolo_nas_s", pretrained_weights="coco").to(device)

You can use the following snippet to declare a variable device as a CPU or GPU.

import torch
device = torch.device("cuda:0") if torch.cuda.is_available() else torch.device("cpu")

Step 4: Inference on an image

yolo_nas_s.predict("img.jpg").show()

Here is the output of our model:

Result image

As you can see in the resulting image, all the chairs are detected. The small object on the table like a cup and laptop are also get detected.

Fine-tuning YOLONAS on a custom dataset

Step 1: Instantiate the trainer with just a single GPU.

from super_gradients.training import Trainer

CHECKPOINT_DIR = 'checkpoints'
trainer = Trainer(experiment_name='my_first_yolonas_run', ckpt_root_dir=CHECKPOINT_DIR)

Step 2: You can export your dataset from the roboflow in the yolov5 format.

from roboflow import Roboflow
rf = Roboflow(api_key="<your-roboflow-key-here>")
project = rf.workspace("atathamuscoinsdataset").project("u.s.-coins-dataset-a.tatham")
dataset = project.version(5).download("yolov5")

Step 3: Import the required modules, which will help you create Super Gradients data loaders.

from super_gradients.training import dataloaders
from super_gradients.training.dataloaders.dataloaders import coco_detection_yolo_format_train, coco_detection_yolo_format_val

Step 4: Now, load your own dataset parameters into a dictionary by defining the parent directory where data is stored along with the directory names for the training, validation, and test images and labels. You also need to provide class names in a sequence that you have used at the time of annotations.

dataset_params = {
'data_dir':'/content/U.S.-Coins-Dataset---A.Tatham-5',
'train_images_dir':'train/images',
'train_labels_dir':'train/labels',
'val_images_dir':'valid/images',
'val_labels_dir':'valid/labels',
'test_images_dir':'test/images',
'test_labels_dir':'test/labels',
'classes': ['Dime', 'Nickel', 'Penny', 'Quarter']
}

Step 5: Here we are passing values for dataset_params into the dataset_params argument. Here we have set the batch size to 16 and num_workers to 2 as shown below.

from IPython.display import clear_output

train_data = coco_detection_yolo_format_train(
dataset_params={
'data_dir': dataset_params['data_dir'],
'images_dir': dataset_params['train_images_dir'],
'labels_dir': dataset_params['train_labels_dir'],
'classes': dataset_params['classes']
},
dataloader_params={
'batch_size':16,
'num_workers':2
}
)

val_data = coco_detection_yolo_format_val(
dataset_params={
'data_dir': dataset_params['data_dir'],
'images_dir': dataset_params['val_images_dir'],
'labels_dir': dataset_params['val_labels_dir'],
'classes': dataset_params['classes']
},
dataloader_params={
'batch_size':16,
'num_workers':2
}
)

test_data = coco_detection_yolo_format_val(
dataset_params={
'data_dir': dataset_params['data_dir'],
'images_dir': dataset_params['test_images_dir'],
'labels_dir': dataset_params['test_labels_dir'],
'classes': dataset_params['classes']
},
dataloader_params={
'batch_size':16,
'num_workers':2
}
)

clear_output()

Step 6: SuperGradients added transformers.

train_data.dataset.transforms

Step 7: In this step, we are Instantiating our model. Here, we need to add a number of classes argument.

from super_gradients.training import models
model = models.get('yolo_nas_l',
num_classes=len(dataset_params['classes']),
pretrained_weights="coco"
)

Step 8: You can define some training parameters like max_epochs, loss, optimizer, train_metrices_list, vallid_metrices_list, and metric_to_watch. You can choose different optimizers such as Adam, AdamW, SGD, Lion, or RMSProps.

from super_gradients.training.losses import PPYoloELoss
from super_gradients.training.metrics import DetectionMetrics_050
from super_gradients.training.models.detection_models.pp_yolo_e import PPYoloEPostPredictionCallback

train_params = {
# ENABLING SILENT MODE
'silent_mode': True,
"average_best_models":True,
"warmup_mode": "linear_epoch_step",
"warmup_initial_lr": 1e-6,
"lr_warmup_epochs": 3,
"initial_lr": 5e-4,
"lr_mode": "cosine",
"cosine_final_lr_ratio": 0.1,
"optimizer": "Adam",
"optimizer_params": {"weight_decay": 0.0001},
"zero_weight_decay_on_bias_and_bn": True,
"ema": True,
"ema_params": {"decay": 0.9, "decay_type": "threshold"},
# ONLY TRAINING FOR 10 EPOCHS FOR THIS EXAMPLE NOTEBOOK
"max_epochs": 10,
"mixed_precision": True,
"loss": PPYoloELoss(
use_static_assigner=False,
# NOTE: num_classes needs to be defined here
num_classes=len(dataset_params['classes']),
reg_max=16
),
"valid_metrics_list": [
DetectionMetrics_050(
score_thres=0.1,
top_k_predictions=300,
# NOTE: num_classes needs to be defined here
num_cls=len(dataset_params['classes']),
normalize_targets=True,
post_prediction_callback=PPYoloEPostPredictionCallback(
score_threshold=0.01,
nms_top_k=1000,
max_predictions=300,
nms_threshold=0.7
)
)
],
"metric_to_watch": 'mAP@0.50'
}

Step 9: Now we can start a model training using a SuperGradients trainer.

trainer.train(model=model, 
training_params=train_params,
train_loader=train_data,
valid_loader=val_data)

Step 10: Using the best weights, we can perform detections on different test images.

img_path = 'your image path'
best_model.predict(img_url).show()

In this way, you can fine-tune YOLO-NAS for your own dataset.

Conclusion

Deci’s AutoNAC neural architecture search technology has made it possible for the YOLO-NAS model to achieve remarkable speed and accuracy. This model stands out among all the object detection models available in the market by offering the best tradeoff between accuracy and latency. YOLO-NAS is capable of being quantized and deployed with TensorRT, making it fully compatible for use in production.

Resources

If you found this article insightful, follow me on LinkedIn and Medium.

Stay tuned !!!

Thank you !!!

You can also read my other blogs

--

--

Rohini Vaidya
CodeX
Writer for

Software developer | Machine learning | Data science | Computer vision | Artificial Intelligence