NVIDIA Tensor RT Platform for High-Performance DL Inference Part 2

Artemy Malkov, PhD
Product AI
Published in
Jul 15, 2021

Here are five key technologies for high-speed network output. Calibration of network weights to reduce their dimension and transition from float32 to int8. Dynamic memory allocation, efficient use of memory without reallocating new memory. Combining layers and tensors to optimize the computation graph. Parallelize task execution and select convolution. Mathematical algorithms that are ideal for a given GPU platform.

Original article written by Rinat S.

https://medium.com/@rinats

--

--

Artemy Malkov, PhD
Product AI

Scientist, Entrepreneur, AI Product Management practitioner