NVIDIA Transfer Learning Toolkit — A Comprehensive Guide
In today’s world, most of the highly optimized Deep Neural Networks architecture is already available to use and what makes it more impressive is the ability to train only the last few layers of a pre-trained model to achieve superhuman accuracy in very less time.
In this article, we are going to train a model on publically available KITTI Dataset, using NVIDIA Transfer Learning Toolkit (TLT) and deploy it to Jetson Nano.
- The first step is to set up your NVIDIA NGC account and pull the TLT container.
docker pull nvcr.io/nvidia/tlt-streamanalytics:v1.0_py2
- After that, you can start your TLT container using the below command.
docker run --runtime=nvidia -it \
-v /home/$USER/workspace:/workspace \
-p 8888:8888 tlt-streamanalytics:v1.0_py2
Make sure you have attached your local system directory with the container this will ensure that all the data you produce while training resides on your local system rather than being stuck inside the container.
You are going to perform all the experiments inside the jupyter notebook so let’s start it first.
cd /workspace
jupyter notebook --ip 0.0.0.0 --allow-root
- The Notebook is available at http://localhost:8888.
- Login to https://ngc.nvidia.com/setup, generate an API Key for later use.
- In the first code cell of the Notebook set up your API_KEY.
- Make sure you have created the below directory structure.
|----workspace
|---- dataset
|---- tf_records
|---- pretrained_model
|---- trained_model
|---- pruned_model
|---- retrained_model
|---- exported_model
|---- spec_files
|---- scripts
Dataset Preparation
The dataset can be downloaded from here https://www.kaggle.com/twaldo/kitti-object-detection/data
Extract it inside workspace/dataset directory for easy access, It has both training & testing directories both containing images and their corresponding labels.
Let’s have a look at the training dataset size:
!echo Total Training Images:
!ls /workspace/dataset/KITTI_original/training/image_2 -l | wc -l
!echo
!echo Total Training Labels:
!ls /workspace/dataset/KITTI_original/training/label_2 -l | wc -l
- Total Training Images: 7482
- Total Training Labels: 7482
Also, let’s have a look at one of the images
from IPython.display import display
from PIL import Imagepath = '/workspace/dataset/KITTI_original/training/image_2/003945.png'
image = Image.open(path)
display(image)
width, height = image.size
print("HxW: ({}, {})".format(height, width))
Let’s confirm that our dataset label file has all the 15 required columns.
!cat /workspace/dataset/KITTI_original/training/label_2/003945.txt
You have to make sure that the training images are all in the same resolution otherwise TLT throws an error in between the training.
Hmm… something fishy here, as you can see there is a slight difference in the resolutions of some images, we have to fix that. We will resize all of the images to HxW: 128, 512 dimensions while rescaling their labels simultaneously.
The reason for choosing these dimensions:
- It is in multiple of 16
- Relatively smaller than the original image
- Has an aspect ratio closer to the original resolution.
original HxW: (375, 1242) ==> aspect ratio W/H: 3.32
new HxW: (128, 512) ==> aspect ratio W/H: 4
Using a python script that can resize all images and their labels to the given output dimension, convert the images to 128x512 with “.jpg” format and store them to a new directory.
Converting dataset to TFRecords
According to the TLT Docs, you have to convert the dataset to TFRecords before training, this provides a fast and efficient way for the model to read dataset.
- It can be converted into TFRecords using the tlt-dataset-convert command
- To continue you have to create a spec file to describe the dataset.
!cat /workspace/spec_files/convert.txt
see the spec file here https://rebrand.ly/ypn5ar
Command for converting the dataset into TFRecords
!tlt-dataset-convert -d spec_files/convert.txt -o /workspace/tf_records/
Downloading Pre-Trained Model
You can view models available on NGC.
!ngc registry model list *detectnet*
Download your chosen model
!ngc registry model download-version nvidia/iva/tlt_resnet18_detectnet_v2:1 -d /workspace/pretrained_model
Training
We are now just one step behind training our model, this is the most important part of the whole tutorial i.e., to prepare a valid spec file for training the model. According to official TLT Documentation the spec files has 8 key components responsible for tuning the model for a good precision:
- model_config
- bbox_rasterizer_config
- cost_function_config
- training_config
- augmentation_config
- postprocessing_config
- dataset_config
- evaluation_config
You can read more about them in detail here: https://docs.nvidia.com/metropolis/TLT/tlt-getting-started-guide/index.html#spec_file_gridbox_topic
This is how your training spec file should look like
!cat /workspace/spec_files/train.txt
see the spec file here https://rebrand.ly/9kbu1z
Finally, you can start training your model
!tlt-train detectnet_v2 -e spec_files/train.txt \
-r /workspace/trained_model --gpus 5 -k $API_KEY
Evaluation
If you want to be double sure about the precision of the model, you can evaluate the test dataset.
!tlt-evaluate detectnet_v2 -e spec_files/train.txt \
-m "/workspace/trained_model/model.step-9600.tlt" \
-k $API_KEY
Inference
It’s that exciting moment when you will see your model in action, yes now you can expect your model to draw those superficial bounding boxes.
But before that, you need to prepare your inference spec file.
!cat /workspace/spec_files/infer.txt
see the spec file here https://rebrand.ly/5b780
Now that you have your inference spec file prepared you can use your model for the inference, we are using some images of different resolutions that are not already present in our dataset. Hope for the best!
!tlt-infer detectnet_v2 \
-m "/workspace/trained_model/model.step-9600.tlt" \
-i /workspace/dataset/test_image \
-o /workspace/inferred_images \
-k $API_KEY \
-bs 16 \
-cp spec_files/infer.txt
Visualizing images
The TLT takes input from a directory and saves the inferred images to a specified output directory, to see those images you have to load them similarly as we did previously.
from IPython.display import display
from PIL import Imagepath = '/workspace/inferred_images/images_annotated/car001_720x1280.jpg'
image = Image.open(path)
display(image)
width, height = image.size
print("HxW: ({}, {})".format(height, width))path = '/workspace/inferred_images/images_annotated/car001_128x512.jpg'
image = Image.open(path)
display(image)
width, height = image.size
print("HxW: ({}, {})".format(height, width))
Pruning the Model
Pruning commonly allows reducing the number of parameters by an order of magnitude in the vision applications targeted by TLT, leading to a model that is many times faster.
!tlt-prune -pm "/workspace/trained_model/model.step-9600.tlt" \
-o "/workspace/pruned_model" -pth 0.30 -nf 16 \
-k $API_KEY
After pruning it is very important to retrain your model for better performance, but you need to create your retraining spec file first.
Don’t worry just make a copy of previously created training spec file and replace old pre-trained model weights file path with your newly trained model weights file path.
This is how your re-training spec file should look like
!cat /workspace/spec_files/retrain.txt
see the spec file here https://rebrand.ly/fcot8y
With the re-training spec file created you are good to go with re-training your model
!tlt-train detectnet_v2 -e "/workspace/spec_files/retrain.txt" \
-r "/workspace/retrained_model" --gpus 5 -k $API_KEY
Exporting model
No doubt the model has performed beyond the expectations but it is still in the TLT environment.
To believe that the model does perform as per the expectations and can be used with our edge devices we need to export it.
There are three options available for exporting the model:
- FP16
- FP32
- INT8
We are going to export with INT8, For exporting to INT8 a calibration file needs to be generated.
!tlt-int8-tensorfile detectnet_v2 -e spec_files/retrain.txt \
-o exported_model/calibration.tensor -m 20
Now you are ready to export your model
!tlt-export "/workspace/retrained_model/model.step-9600.tlt" \
-k $API_KEY \
--export_module detectnet_v2 \
--outputs output_bbox/BiasAdd,output_cov/Sigmoid \
--data_type int8 \
--output_file exported_model/smchyd_demo_model.etlt \
--cal_data_file exported_model/calibration.tensor \
--cal_cache_file exported_model/calibration.bin \
--input_dims 3,128,512
If you wish to export with FP16
or FP32
use below commands:
FP16
!tlt-export "/workspace/retrained_model/model.step-9600.tlt" \
-k $API_KEY \
--export_module detectnet_v2 \
--outputs output_bbox/BiasAdd,output_cov/Sigmoid \
--data_type fp16 \
--output_file exported_model/smchyd_demo_model.etlt
FP32
!tlt-export "/workspace/retrained_model/model.step-9600.tlt" \
-k $KEY \
--export_module detectnet_v2 \
--outputs output_bbox/BiasAdd,output_cov/Sigmoid \
--data_type fp32 \
--output_file exported_model/smchyd_demo_model.etlt
With this, you have successfully exported uour model.
To be able to use the model with deepstream you need to create 3 files:
- labels.txt
- Primary_inference.txt
- stream_config.txt
You’ll need to modify these files slightly as per your model configuration i.e., input dimensions, threshold, etc.
get those files here https://rebrand.ly/3715a
After which the final step is to move the files:
- Generated inside “/workspace/exported_model”
- And the 3 three files we talked about above
To our edge device and use the tlt-converter app to convert your model as per your jetson device configuration to be able to work properly.
To move the exported model to our Jetson Nano we will first create a directory where we will keep our model and it’s associated configuration files. So do an SSH connection to the device and make the required directory.
nitin@ThinkPad:~ ssh nano@192.168.0.156
nano@192.168.0.156’s password:welcome to ubuntu 18.04.3 LTSnano@nano-desktop:~ mkdir TLT_DEMO
Now finally you have to move your trained model to the nano, open another terminal and move files with SCP command.
Make an archive for the files so that it will be easier to move or you can manually move those files to your edge device.
- so that it will be easier to move or you can manually move those files to your edge device. After moving files just extract it there.
Extract the files inside the jetson device.
Run DeepStream (on Jetson)
Download tlt-converter on the edge device from https://developer.download.nvidia.com/assets/TLT/Public/TLT_Converter.zip
As we are going to convert our model, we also need a directory to hold our deepstream config files. Create it with mkdir ds_configs.
Now we will convert our model
sudo ./tlt-converter -k $KEY -d 3,128,512 \
-o output_bbox/BiasAdd,output_cov/Sigmoid \
-e ~/TLT_DEMO/ds_configs/INT8_m1.plan \
-t int8 \
-c ~/TLT_DEMO/calibration.bin \
-m 1 \
~/TLT_DEMO/smchyd_demo_model.etlt
Finally, after this we are ready to run our inference. Just little adjustments are required in the config files.
Also, don’t forget to move all the deepstream related model files inside ds_configs directory including generated model as keeping them inside the same directory helps reduce chances of error due to misconfiguration of paths.
Here is the final summary of files present in their respective directories before we run our model
Please place them in the correct directory if not already. Now we are ready to run our model, make sure you have a monitor connected to your edge device stay inside the ds_configs directory and run
deepstream-app -c stream_config.txt
And Boom! Our model started predicting trained object labels.
Here is the link to the jupyter notebook https://rebrand.ly/tl4k1h being used for training the model.
About Sentinel
Sentinel is an Nvidia NX / Nano powered hardware platform with the potential to run several states of the art, Deep Learning Models and provide support for Intelligent Video Analytics at the edge.
You can learn more about Sentinel on the product page https://smartcow.ai/products/sentinel.html
Here is a side by side comparison of Sentinel and Nano
Some meaningful stats
Key features of TLT
- GPU optimized pre-trained weights for computer vision tasks
- Easily modify configuration files for adding new classes and retraining models with custom data
- Reduce model sizes using pruning functionality
Special thanks to: Eddie Seymour, Vipul Amin, Charbel Aoun , Morgan Huang
Author
Nitin Rai
Jr. ML Engineer at SmartCow, Responsible for Rapid Prototyping
Reference:
Dev talk forum https://rebrand.ly/3xr6ts