Introduction to NVIDIA Deepstream

Nawin Raj Kumar S
kgxperience
Published in
3 min readOct 22, 2022

The effectiveness of Computer Vision is increasing rapidly, from face id protection in smartphones to disease detection in plants. Computer Vision is becoming the key to the world and Single-Shot-Detectors like YoLo(You Only Look Once) provide state-of-the-art performance for the detection of objects. However, the frames-per-second(FPS) of these SSDs becomes a question. These frameworks consume a lot of work from the CPU and yet produce a very low FPS. As it runs both rendering and detection of multiple objects at the same time, hence producing reduced FPS. What if I told you we can improve the FPS of the system without enhancing the hardware? In the previous article, we saw how CUDA improves the efficiency of programming. We will now see a framework which uses the CUDA as a core, and you’ll see why I love NVIDIA this much.

Logo of NVIDIA Deepstream

Well, folks, I would like to introduce Deepstream. DeepStream is a streaming analytic toolkit to build AI-powered applications. It takes the streaming data as input — from a USB/CSI camera, video from files or streams over RTSP, and uses AI and computer vision to generate insights from pixels for a better understanding of the environment. It uses parallel processing to detect multiple objects using GPU by the concept of parallel processing.

The DeepStream SDK lets you apply AI to streaming video and simultaneously optimize video decode/encode, image scaling, conversion and edge-to-cloud connectivity for complete end-to-end performance optimization.

So how does it work?

Deepstream runs based on the Gstreamer pipeline. The Gstreamer pipeline starts from the source(which is the USB/IP Camera which is used for input) and sinks (which is the display which shows the output). The picture shows the architecture of Deepstream.

Deepstream Pipeline Architecture

These are the components used in the pipeline of deep stream:

  • Decode: Decoding is the process of converting the encoded image back to an uncompressed bitmap which can then be rendered on the screen, Nvidia NVDEC (formerly known as NVCUVID) is a feature in its graphics cards that performs video decoding, offloading this compute-intensive task from the CPU.
  • Image Processing: The input image is being preprocessed where a number of operations like cropping, scaling and warping are done, This can either be done using CPU or NVIDIA VIC.
  • Classification: The classification is done by Deep Learning algorithms. Nvidia has a library for a specific purpose called CUDA Deep Neural Network, yes, you heard it right. cuDNN provides highly tuned implementations for standard routines such as forward and backward convolution, pooling, normalization, and activation layers. Apart from that, we can use Yolo, TAO (Yet another Transfer Learning Toolkit from Nvidia, I love NVIDIA).
  • Tiler: Tiler is used to tile multiple screens on display.
  • Sink: This is the display which produces the output.

In further articles, we’ll see how to create your own pipeline and the heart of the DeepStream TensorRT.

--

--