NVIDIA Tensor RT Platform for High-Performance DL Inference Part 2

Published in

Product AI

Jul 15, 2021

Here are five key technologies for high-speed network output. Calibration of network weights to reduce their dimension and transition from float32 to int8. Dynamic memory allocation, efficient use of memory without reallocating new memory. Combining layers and tensors to optimize the computation graph. Parallelize task execution and select convolution. Mathematical algorithms that are ideal for a given GPU platform.

Original article written by Rinat S.

https://medium.com/@rinats

NVIDIA Tensor RT Platform for High-Performance DL Inference Part 2

Written by Artemy Malkov, PhD