Analytics Vidhya
Published in

Analytics Vidhya

Optimizing arch64 Edge devices for Maximum Performance on ML

Source images.google.com
  1. Jetson Nano
  2. Jetson TX1
  3. Jetson TX2
  4. Jetson AGX Xavier
  1. Run the Jetson clock.
$ sudo jetson_clocks.sh
$ fallocate -l 8G swapfile
$ sudo chmod 600 swapfile
$ sudo mkswap swapfile
$ sudo swapon swapfile
$ free -m
  1. Max Q
fig. Energy profiles for arch64[wikipedia]
sudo nvpmodel -m <mode number for desired profile>
  • Convolution Core — optimized high-performance convolution engine.
  • Single Data Processor — single-point lookup engine for activation functions.
  • Planar Data Processor — planar averaging engine for pooling.
  • Channel Data Processor — multi-channel averaging engine for advanced normalization functions.
  • Dedicated Memory and Data Reshape Engines — memory-to-memory transformation acceleration for tensor reshape and copy operations.
  • Compilation of deep learning models in Keras, MXNet, PyTorch, Tensorflow, CoreML, DarkNet into minimum deployable modules on diverse hardware backends.
  • Infrastructure to automatic generate and optimize tensor operators on more backend with better performance.
  1. http://nvdla.org
  2. https://docs.nvidia.com/jetson/index.html
  3. https://tvm.ai

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store