Member-only story
SUPERCOMPUTING FOR ARTIFICIAL INTELLIGENCE — 03
Deep Learning Frameworks for Parallel and Distributed Infrastructures
Reduce training time for deep neural networks using Supercomputers
[This post will be used in the master course Supercomputers Architecture at UPC Barcelona Tech with the support of the BSC]
“Methods that scale with computation are the future of Artificial Intelligence” — Rich Sutton, father of reinforcement learning (video 4:49)
This third post of this series will explore some fundamental concepts in distributed and parallel Deep Learning training and introduce current deep learning frameworks used by the community. We also introduce a sequential code of an image classification problem that we will use as a baseline to calculate the scalability performance that can be obtained with the proposed parallel and distributed frameworks in the next two posts.
1 Basic concepts in distributed and parallel training Deep Learning
DNN base their success on building high learning capacity models with millions of parameters that are tuned in a data-driven fashion. These models are trained by processing millions of examples so that the development…