Computer Vision (AI) in Production using Nvida-DeepStream

May 13 · 14 min read

NO CODING EXPERIENCE REQUIRED, This is an extensive article covering the majority of topics linked with CV and Nvidia-DS. To get started without reading and understanding the concept head over here to get your results in 5 mins, all commands are sequential and easy to follow.

Since the rise of machines(gpu and gpu friendly ecosystem)- and the omnipresent internet there has been huge accumulation of all sorts of data(data in AI). Pictorial data — images, videos are easily available in billions and is just one Google search away. This has hugely benefited researchers and developers to use such collection and train DeepLearning(DL) models to their heart’s content. This has also led to availability of massive library of pre-trained models on different architectures — open model zoo which have collection of object detection, image segmentation, machine translation, text-to-speech, human pose estimation, gaze estimation , attribute classification and many many more models which are in absolute ready to go condition.

Computer Vision has breached all domains be it medical , manufacturing surveillance, shopping, driving and take a wild guess and there will be some company already building smart AI -CV based alternatives to it. Building a computer vision model is one part and deploying them is a whole nother ball game. To learn how to take the 1st step and try out your very first computer vision model using Tensorflow follow the link. A GPU is a must to train your model (ssd-mobilenet, yolov3, resnet, inception etc.) but when it comes to deployment we got quite a few options , turns out you don’t need GPU after-all*. Intel has been working to bring the CV-inference capabilities in its CPU cores.

This article will however be the guide to deploy your 1st CV model to production in an NVIDIA-GPU based system(no programming needed) , Ubuntu 18.04 LTS required. Summary of what’s coming next:

  • Docker
  • GStreamer
  • Detection + classification + tracking

Prerequisite — GPU card, 20 GB HDD space.

Don’t worry if you have no idea about what docker, GStreamer is. These are just some tools to run inference(detection+classification) on your video. If you have any experience in TF/pytorch/darknet and running inference you must know that at most you can run 2–4 videos in parallel. Here you will learn to run 30+ videos without writing any python script.

Let’s Begin with Story time :-) To get the results of this experiment and minimize the chances of getting errors due to library issues we will be using docker.

About Docker — Docker is nothing less than a magic in this time when you have 100s of packages / dependencies/ files to keep track off, not to mention the versions’ inter-compatibility to each other. Docker provides you with an isolated space with pre installed everything to run a designated app/software. Its not like VM but a lot lighter than that. Read about docker here. But no we will not be going all about how docker works just the peripheral terms. Docker — Image — is the set of commands that forms Docker — Container and docker container is your own sandbox where you can experiment all you want. We will be writing all commands as we go.

NVIDIA-DEEPSTREAM- Check the page to get all the stats about the performance number, if you have decent enough GPU you can expect the same. There are various methods to get the deepstream(DS) sdk working and as mentioned earlier we will use the safest of them all the: DOCKER-way.

What you see above is a flow of of data from left to right, this data is being processed in different stages to produce the end result. Data here refers to VIDEO’s and before all this fancy inference engines and SDK and apps there was one true king for such video processing called GSTREAMER being used since Jan 2001. To understand WHY we are taking about GSTREAMER and seemingly going off topic, we must understand more about Video and its properties. Video is nothing but a collection of frames flowing at certain speed and displayed at certain speed. Cable tv generally have 25–30 frames displayed(updated) per second also known as FPS, youtube have options of 24,25,30,48,60 fps to upload a video. These frames, images of resolution 720x1280 have a fixed format too — they have 8bit(0–255) color and 3 channel of color(RGB). Follow this to understand more. Again this is very fundamental and just to make you understand that any RAW image(frame) is very heavy in itself (there are many number of formats an image can be stored in and lot of resolutions too) and needs compression before it can be used in any form. Mp4, mkv, avi are some of the very well known containers that have their own compression technique which balances between compression ratio and quality. The Ideal scenario is to have a high compression ratio without any compromise in picture quality. But again that’s ideal. JPEG, MPEG, x264,x265 are some of the codecs used to form those containers as well as to transmit the data (video) from one system to another using RTSP/UDP protocols in real time without losing or misplacing any frame. Now all the above jargon is completely handled flawlessly by GSTREAMER the one true king. For now we must only know that these terms exists and we will use them here and there as required.

GST(GSTREAMER) works on the concept of pipeline and these pipeline are build by plugins. The blue boxes seen in the above image are plugins. Now we will see how this DEEPSTREAM fit into the picture. DEEPSTREAM SDK — Nvidia has build its own set of plugins which can be used within the GST pipelines. These DS plugins are responsible for running inference among many other things we will see when we deploy our 1st model. In our DEEPSTREAM DOCKER IMAGE we will get these pre-installed plugins and our job will be just to line them up like a lego peices in correct order to get the inference.

Now we will begin with the DOCKER. The sort version is here.

  • To install docker follow DOCKER-installation.
  • Lets check if your docker is properly installed by running:
sudo docker run hello-world

This command downloads a test image and runs it in a container. When the container runs, it prints an informational message and exits.

  • Nvidia has its own layer that operates over normal Docker : NVIDA-DOCKER.
sudo apt-get update
sudo apt-get install -y nvidia-docker2
sudo systemctl restart docker
sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

To confirm that we have every library working run :

sudo docker run --rm --gpus all nvidia/cuda:11.0-base nvidia-smi

The Output should look something like:

| NVIDIA-SMI 450.51.06 Driver Version: 450.51.06 CUDA Version: 11.0 |
| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |
| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |
| | | MIG M. |
| 0 Tesla T4 On | 00000000:00:1E.0 Off | 0 |
| N/A 34C P8 9W / 70W | 0MiB / 15109MiB | 0% Default |
| | | N/A |

This output shows details about your GPU and the driver versions — here we can see card name with Nvidia-driver version and CUDA version along with temperature and memory of the card.

We will now download the relevant files from github which will make our life a lot easier for running and understanding the DEEPSTREAM SDK. Remember all the files are available inside the DeepStream docker but its a mess to understand when you have too much information at once.

Download the github files at any location of your choice , I am using Documents folder.

kuk:~$ cd Documents/

The above folder needs to be mounted inside our docker image of DS. Also we have to make container for our DS. Here is how you can do it.

xhost +

“xhost +” allows the docker to access the display of our host. So if our container needs to show any output it wont be stopped. MAKE SURE TO RUN ‘xhost +’ THIS COMMAND ANYTIME YOU RUN CONTAINER.

Breaking down the command step by step:

  • sudo : admin privilages
  • docker : calling docker which we installed earlier
  • run : asking docker to run the image which we will describe ahead
  • — — runtime=nvidia : initiating the layer of nvidia-docker (installed earlier)
  • -it : this will keep the container interactive, will allow us to write/edit commands inside the container.
  • -d : represents detach mode, meaning it will print the container ID and container will be running in background, which we will access it after words.
  • -e DISPLAY=$DISPLAY : is to set environmental variables, meaning giving container the path to DISPLAY in a same fashion our ubuntu uses display.
  • — — name=dst : it is the name of your container, I have used ‘Thor’, you can use what ever you like BlackWidow, KhalDrogo, NightKing are all good names.
  • -v $HOME/Documents/DS_computer_vision/:/home/ : This ‘-v’ helps us to mount the directory from our host system inside the container, syntax is something like HOST MACHINE ADDR(documents/DS_computer_vision):ADDR INSIDE CONTAINER(/home/). And since we know the container is following ubuntu kernel, there would be a folder name home. so all our files will be mounted inside home folder of container and I feel very comfortable working with this as there is no place like HOME.
  • — — net=host : this lets the container to communicate through internet on all ports , you can restrict it by using -p publish command. but its not worth it so dont try.
  • — — gpus=all : helps NVIDIA-DOCKER to access to all gpus on your machine, and since before all this docker you have installed NVIDIA_DRIVERS AND CUDA , this helps our container to see the Gpu on your machine.
  • : Last but not least is the name of the main DEEPSTREAM image which will be now converted to container name — ‘dst’.

There are many tags to run / modify the above container in whatever configuration you like — follow this link to explore all but make sure you understand the meaning before doing anything.

As soon as you hit enter you will see it will start downloading many things all at once, total download will be about 5.5 Gb to sit back and relax. Once it is done it will print a very long CONTAINER_ID and you will be back to your terminal window. This means our sandbox is now ready to explore and exploit. To check if your container is up and running :

sudo docker ps

This will list out all the active containers and there name. you will find your ‘Thor’ here.

Exploring the container. To get inside the container we need to follow 2 commands, the first is:

xhost +x

and the second is :

sudo nvidia-docker exec -it dst bash

We know ‘sudo’ and ‘nvidia-docker’ , exec : runs a new command in the running container as our ‘dst’ container is running already we have used exec. ‘-it ’ gives interactive power to us with ‘Thor’ , Thoris the name of our container and ‘bash’ says we need a bash shell inside the container.

After pressing Enter you will be now inside the docker shell.

we are automatically deep inside deepstream directory. lets move to our own folder which we have mounted in /home/.

by running ‘ls’ inside home you will see the ‘DS_computer_vision’ folder.

Remember you are now inside the container so no need for sudo anymore. Any library you install will remain here until you manually remove the docker. If you are done for the day and want to power down/ restart your machine just type ‘exit’. and you will be out from your container, but your container is still running. So to stop the container do this:

sudo docker stop dst

That’s it, it will stop the container and all the programs if you have anything running inside. Like it will turn off the stove, kill the lights until you are back /home/. After you have chugged 1 cup of coffee and rebooted your system you can go back to your container and start where you left off by:

sudo docker start dst

It will start the container , and now you have to get inside by ‘sudo nvidia-docker exec -it Thor bash’

The home folder inside the Thor container will have these files in them.

  • There are Primary_detectors, and secondary_classifiers, these containes labels and required weights to run inference in optimised fashion.
  • Files with name- ‘deep_stream_1_feed.txt’ and similar are what we will be using to run the model and see the inference.
  • ‘config_infer…’ these files are supporting configuration to our main file. and then there are some sample videos as .mp4(only mp4 supported) to see the output on.
root@kuk:/home# ls
Primary_Detector config_infer_primary.txt crowd3.mp4 ...

When you have all the above files inside home folder :

deepstream-app -c deep_stream_1_feed.txt --tiledtext

This will initiate the pipeline and take a few seconds, if you have anyhow misplaced any file or renamed any folder the pipeline will produce error accordingly. If everything is as mentioned here the pipeline will roll just fine and you will see the output in a new window that will pop up.

There are total 4 such text files to run 4 different variations:

  • Detection + tracking on 1 input stream.
  • Detection + tracking + classification type1 + classification type2 + classification type3 on 1 input stream.
  • Detection + tracking on 30 input stream.
  • Detection on 40 input stream.

As it was not possible to upload 30 videos in the GitHub repo, I have used 3 videos in multiple of 10 to replicate the 30 input stream.

deepstream-app -c deep_stream_1_feed_3_classification.txt --tiledtext

As soon as the video is displayed the terminal starts showing the current fps, there you can easily get an idea about how efficiently your GPU is being used by NVIDIA plugins , and its is that simple to actually take a COMPUTER VISION MODEL TO PRODUCTION DEPLOYMENT LEVEL.

As mentioned earlier everything here occurs on Gstreamer framework for video processing, here is the GST pipeline that you can run from the same location to see the same results (detection + tracking):

gst-launch-1.0 filesrc location=traffic_cam_clear_edited_1.mp4 ! decodebin ! m.sink_0 nvstreammux name=m batch-size=1 width=1280 height=720 live-source=1 ! nvinfer config-file-path=$(pwd)/config_infer_primary.txt batch-size=1 unique-id=1 ! nvtracker ll-lib-file=$(pwd)/tracker_so/ display-tracking-id=1 ! nvinfer config-file-path= $(pwd)/config_infer_secondary_carmake.txt batch-size=1 unique-id=2 infer-on-gie-id=1 infer-on-class-ids=0 ! nvvideoconvert ! nvdsosd ! nveglglessink sync=0

Now if you open any ‘deep_stream_1_feed.txt’ file and go through the syntax, you will find it is divided into well sorted segments starting from:

  • [application]
  • [tiled-display]
  • [source0]
  • [sink0]
  • [osd]
  • [streammux]
  • [primary-gie]
  • [tracker]
  • [secondary-gie1]
  • [tests]

These respective fields help Deepstream to make the same GST-pipeline that we just mentioned. But instead of dealing with a single command which has a lot of terms and no scope of error and high probability of getting it wrong and not even understanding what went wrong, we use the .txt based method to initialize the deepstream-app.

You can find all about the source sinks and what categories they have here.

You will find all details for each class in detail here, just see what you can understand and what is being used here. Try to add / modify certain elements on your own and see what difference it makes to output results. It is completely on you and your will to explore this new dimension of NVIDIA-DEEPSTREAM APP. There are some sample files present at/opt/nvidia/deepstream/deepstream-5.1/samples/configs#

To run them :

root@kuk:/opt/nvidia/deepstream/deepstream-5.1/samples/configs/deepstream-app# deepstream-app -c source4_1080p_dec_infer-resnet_tracker_sgie_tiled_display_int8.txt

Nvidia has made it very easy to deploy Computer Vision models in the real world. You can convert all your models to be used in this particular fashion, be it yolov3, ssd-mobilenet, caffe, pytorch. All famous architectures can be converted and used here. In very easy terms these are the lego pieces and you are the architect and you control what you want out of it, add them as you like and you will end up the results you are looking for or will learn how 2 pieces are not meant to go together.There are many benefits of using Nvidia-platforms including but not limited to:

  • Using docker makes us free of any hassle regarding supported libraries.
  • The DS-app doesn’t crash if 1 or 2 or 20 streams out of 40 suddenly stops streaming.
  • There is full support to use this model in security camera feeds, you just have to obtain a valid RTSP address and replace the .mp4 with your RTSP live camera feed. (also need to tweak 1 or 2 values), but thats it now you have real time Detection running on your own security cameras.
  • You can use VLC to see the output feeds, all the .txt files have one of the sink set to port:8554, i.e. you can just open VLC player go to networkstream and add the address as : “rtsp://localhost:8554/ds-test”, now you have all the output on your VLC player, you can further divide and use 40 VLC player to get your 40 cameras feeds.
  • This method allows us to fully use the potential of our hardware, there are few very hungry process and needs lots of horse power to run but Nvidia already got a custom plugin for that , which essentially directs all cpu based prcocess to GPU where CUDA cores effectively handles them without breaking sweat.
  • Encoding, decoding, videoconverting(internal to GST pipe), all can be directed to GPU , tracking algorithm also got many options based on your requirement you can chose a cpu based MOT tracker or a GPU based nvds tracker which performs exceptionally well in real world application.

Developer community has taken this platform to the next level and since this is version 5.1 we already have a few stable applications built into the sdk itself, which are ready to be used and are almost similar to the .txt file format we have used. The support forums are also very active and any query is swiftly answered within 24 hrs. So all in all you get all support from nvidia as well as the community. It’s up to you now how you utilize and make a profitable venture out of this.

To start your journey in Computer Vision and writing your 1st code to run inference on images and video please see — Tensorflow Object detection in windows under 30 line. Even that seems like job then see how to setup your computer to begin with — Setting up TensorFlow 1.14 in bare Windows.Checkout the performance of various models in tensorflow environment in this video. And how impressively facebooks new detectron 2 performs in image segmentation here.

Next article will cover the INTELS approach towards computer vision deployment and you can run it on CPU only, no GPU required.

Data Scientists must think like an artist when finding a solution

Data Scientists must think like an artist when finding a solution, when creating a piece of code.Artists enjoy working on interesting problems, even if there is no obvious answer.


Written by


AI enthusiast, Computer Vision Engineer. Self-Driving Cars needs camera not LIDAR. Vision is the Future.

Data Scientists must think like an artist when finding a solution, when creating a piece of code.Artists enjoy working on interesting problems, even if there is no obvious answer.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store