9 min readDec 3, 2019


NVIDIA Transfer Learning Toolkit — A Comprehensive Guide

In today’s world, most of the highly optimized Deep Neural Networks architecture is already available to use and what makes it more impressive is the ability to train only the last few layers of a pre-trained model to achieve superhuman accuracy in very less time.

Source: NVIDIA

In this article, we are going to train a model on publically available KITTI Dataset, using NVIDIA Transfer Learning Toolkit (TLT) and deploy it to Jetson Nano.

  • The first step is to set up your NVIDIA NGC account and pull the TLT container.
docker pull
  • After that, you can start your TLT container using the below command.
docker run --runtime=nvidia -it \
-v /home/$USER/workspace:/workspace \
-p 8888:8888 tlt-streamanalytics:v1.0_py2

Make sure you have attached your local system directory with the container this will ensure that all the data you produce while training resides on your local system rather than being stuck inside the container.

You are going to perform all the experiments inside the jupyter notebook so let’s start it first.

cd /workspace
jupyter notebook --ip --allow-root
  • Make sure you have created the below directory structure.
|---- dataset
|---- tf_records
|---- pretrained_model
|---- trained_model
|---- pruned_model
|---- retrained_model
|---- exported_model
|---- spec_files
|---- scripts

Dataset Preparation

The dataset can be downloaded from here

Extract it inside workspace/dataset directory for easy access, It has both training & testing directories both containing images and their corresponding labels.

Let’s have a look at the training dataset size:

!echo Total Training Images:
!ls /workspace/dataset/KITTI_original/training/image_2 -l | wc -l
!echo Total Training Labels:
!ls /workspace/dataset/KITTI_original/training/label_2 -l | wc -l
  • Total Training Images: 7482
  • Total Training Labels: 7482

Also, let’s have a look at one of the images

from IPython.display import display
from PIL import Image
path = '/workspace/dataset/KITTI_original/training/image_2/003945.png'
image =
width, height = image.size
print("HxW: ({}, {})".format(height, width))
HxW: (375, 1242)

Let’s confirm that our dataset label file has all the 15 required columns.

!cat /workspace/dataset/KITTI_original/training/label_2/003945.txt

You have to make sure that the training images are all in the same resolution otherwise TLT throws an error in between the training.

Hmm… something fishy here, as you can see there is a slight difference in the resolutions of some images, we have to fix that. We will resize all of the images to HxW: 128, 512 dimensions while rescaling their labels simultaneously.

The reason for choosing these dimensions:

  • It is in multiple of 16
  • Relatively smaller than the original image
  • Has an aspect ratio closer to the original resolution.
original HxW: (375, 1242) ==> aspect ratio W/H: 3.32
new HxW: (128, 512) ==> aspect ratio W/H: 4

Using a python script that can resize all images and their labels to the given output dimension, convert the images to 128x512 with “.jpg” format and store them to a new directory.

Converting dataset to TFRecords

According to the TLT Docs, you have to convert the dataset to TFRecords before training, this provides a fast and efficient way for the model to read dataset.

  • It can be converted into TFRecords using the tlt-dataset-convert command
  • To continue you have to create a spec file to describe the dataset.
!cat /workspace/spec_files/convert.txt

see the spec file here

Command for converting the dataset into TFRecords

!tlt-dataset-convert -d spec_files/convert.txt -o /workspace/tf_records/

Downloading Pre-Trained Model

You can view models available on NGC.

!ngc registry model list *detectnet*

Download your chosen model

!ngc registry model download-version nvidia/iva/tlt_resnet18_detectnet_v2:1 -d /workspace/pretrained_model


We are now just one step behind training our model, this is the most important part of the whole tutorial i.e., to prepare a valid spec file for training the model. According to official TLT Documentation the spec files has 8 key components responsible for tuning the model for a good precision:

  1. model_config
  2. bbox_rasterizer_config
  3. cost_function_config
  4. training_config
  5. augmentation_config
  6. postprocessing_config
  7. dataset_config
  8. evaluation_config

You can read more about them in detail here:

This is how your training spec file should look like

!cat /workspace/spec_files/train.txt

see the spec file here

Finally, you can start training your model

!tlt-train detectnet_v2 -e spec_files/train.txt \
-r /workspace/trained_model --gpus 5 -k $API_KEY


If you want to be double sure about the precision of the model, you can evaluate the test dataset.

!tlt-evaluate detectnet_v2 -e spec_files/train.txt \
-m "/workspace/trained_model/model.step-9600.tlt" \


It’s that exciting moment when you will see your model in action, yes now you can expect your model to draw those superficial bounding boxes.

But before that, you need to prepare your inference spec file.

!cat /workspace/spec_files/infer.txt

see the spec file here

Now that you have your inference spec file prepared you can use your model for the inference, we are using some images of different resolutions that are not already present in our dataset. Hope for the best!

!tlt-infer detectnet_v2 \
-m "/workspace/trained_model/model.step-9600.tlt" \
-i /workspace/dataset/test_image \
-o /workspace/inferred_images \
-k $API_KEY \
-bs 16 \
-cp spec_files/infer.txt

Visualizing images

The TLT takes input from a directory and saves the inferred images to a specified output directory, to see those images you have to load them similarly as we did previously.

from IPython.display import display
from PIL import Image
path = '/workspace/inferred_images/images_annotated/car001_720x1280.jpg'
image =
width, height = image.size
print("HxW: ({}, {})".format(height, width))
path = '/workspace/inferred_images/images_annotated/car001_128x512.jpg'
image =
width, height = image.size
print("HxW: ({}, {})".format(height, width))
inferred image

Pruning the Model

Pruning commonly allows reducing the number of parameters by an order of magnitude in the vision applications targeted by TLT, leading to a model that is many times faster.

!tlt-prune -pm "/workspace/trained_model/model.step-9600.tlt" \
-o "/workspace/pruned_model" -pth 0.30 -nf 16 \

After pruning it is very important to retrain your model for better performance, but you need to create your retraining spec file first.

Don’t worry just make a copy of previously created training spec file and replace old pre-trained model weights file path with your newly trained model weights file path.

This is how your re-training spec file should look like

!cat /workspace/spec_files/retrain.txt

see the spec file here

With the re-training spec file created you are good to go with re-training your model

!tlt-train detectnet_v2 -e "/workspace/spec_files/retrain.txt" \
-r "/workspace/retrained_model" --gpus 5 -k $API_KEY

Exporting model

No doubt the model has performed beyond the expectations but it is still in the TLT environment.

To believe that the model does perform as per the expectations and can be used with our edge devices we need to export it.

There are three options available for exporting the model:

  • FP16
  • FP32
  • INT8

We are going to export with INT8, For exporting to INT8 a calibration file needs to be generated.

!tlt-int8-tensorfile detectnet_v2 -e spec_files/retrain.txt \
-o exported_model/calibration.tensor -m 20

Now you are ready to export your model

!tlt-export "/workspace/retrained_model/model.step-9600.tlt" \
-k $API_KEY \
--export_module detectnet_v2 \
--outputs output_bbox/BiasAdd,output_cov/Sigmoid \
--data_type int8 \
--output_file exported_model/smchyd_demo_model.etlt \
--cal_data_file exported_model/calibration.tensor \
--cal_cache_file exported_model/calibration.bin \
--input_dims 3,128,512

If you wish to export with FP16 or FP32 use below commands:


!tlt-export "/workspace/retrained_model/model.step-9600.tlt" \
-k $API_KEY \
--export_module detectnet_v2 \
--outputs output_bbox/BiasAdd,output_cov/Sigmoid \
--data_type fp16 \
--output_file exported_model/smchyd_demo_model.etlt


!tlt-export "/workspace/retrained_model/model.step-9600.tlt" \
-k $KEY \
--export_module detectnet_v2 \
--outputs output_bbox/BiasAdd,output_cov/Sigmoid \
--data_type fp32 \
--output_file exported_model/smchyd_demo_model.etlt

With this, you have successfully exported uour model.

To be able to use the model with deepstream you need to create 3 files:

  1. labels.txt
  2. Primary_inference.txt
  3. stream_config.txt

You’ll need to modify these files slightly as per your model configuration i.e., input dimensions, threshold, etc.

get those files here

After which the final step is to move the files:

  • Generated inside “/workspace/exported_model”
  • And the 3 three files we talked about above

To our edge device and use the tlt-converter app to convert your model as per your jetson device configuration to be able to work properly.

To move the exported model to our Jetson Nano we will first create a directory where we will keep our model and it’s associated configuration files. So do an SSH connection to the device and make the required directory.

nitin@ThinkPad:~ ssh nano@
nano@’s password:
welcome to ubuntu 18.04.3 LTSnano@nano-desktop:~ mkdir TLT_DEMO

Now finally you have to move your trained model to the nano, open another terminal and move files with SCP command.

Make an archive for the files so that it will be easier to move or you can manually move those files to your edge device.

  • so that it will be easier to move or you can manually move those files to your edge device. After moving files just extract it there.

Extract the files inside the jetson device.

moving files from host machine to edge device
Extracting files

Run DeepStream (on Jetson)

Download tlt-converter on the edge device from

download tlt-converter

As we are going to convert our model, we also need a directory to hold our deepstream config files. Create it with mkdir ds_configs.

Now we will convert our model

sudo ./tlt-converter -k $KEY -d 3,128,512 \
-o output_bbox/BiasAdd,output_cov/Sigmoid \
-e ~/TLT_DEMO/ds_configs/INT8_m1.plan \
-t int8 \
-c ~/TLT_DEMO/calibration.bin \
-m 1 \

Finally, after this we are ready to run our inference. Just little adjustments are required in the config files.

Also, don’t forget to move all the deepstream related model files inside ds_configs directory including generated model as keeping them inside the same directory helps reduce chances of error due to misconfiguration of paths.

Here is the final summary of files present in their respective directories before we run our model

config files

Please place them in the correct directory if not already. Now we are ready to run our model, make sure you have a monitor connected to your edge device stay inside the ds_configs directory and run

deepstream-app -c stream_config.txt

And Boom! Our model started predicting trained object labels.

Here is the link to the jupyter notebook being used for training the model.

Inference on Sentinel (NX/Nano based edge device)

About Sentinel

Sentinel is an Nvidia NX / Nano powered hardware platform with the potential to run several states of the art, Deep Learning Models and provide support for Intelligent Video Analytics at the edge.


You can learn more about Sentinel on the product page

Here is a side by side comparison of Sentinel and Nano

Nano vs. Sentinel

Some meaningful stats

Key features of TLT

  • GPU optimized pre-trained weights for computer vision tasks
  • Easily modify configuration files for adding new classes and retraining models with custom data
  • Reduce model sizes using pruning functionality

Special thanks to: Eddie Seymour, Vipul Amin, Charbel Aoun , Morgan Huang

Nitin Rai
Jr. ML Engineer at SmartCow, Responsible for Rapid Prototyping


Dev talk forum




SmartCow is an AI engineering company that specializes in advanced video analytics, applied artificial intelligence & electronics manufacturing.