Member-only story

SUPERCOMPUTING FOR ARTIFICIAL INTELLIGENCE — 03

Deep Learning Frameworks for Parallel and Distributed Infrastructures

Reduce training time for deep neural networks using Supercomputers

Jordi TORRES.AI
TDS Archive
Published in
13 min readNov 23, 2020

--

Marenostrum Supercomputer — Barcelona Supercomputing Center (image from BSC)

[This post will be used in the master course Supercomputers Architecture at UPC Barcelona Tech with the support of the BSC]

“Methods that scale with computation are the future of Artificial Intelligence” — Rich Sutton, father of reinforcement learning (video 4:49)

This third post of this series will explore some fundamental concepts in distributed and parallel Deep Learning training and introduce current deep learning frameworks used by the community. We also introduce a sequential code of an image classification problem that we will use as a baseline to calculate the scalability performance that can be obtained with the proposed parallel and distributed frameworks in the next two posts.

1 Basic concepts in distributed and parallel training Deep Learning

DNN base their success on building high learning capacity models with millions of parameters that are tuned in a data-driven fashion. These models are trained by processing millions of examples so that the development…

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Jordi TORRES.AI
Jordi TORRES.AI

Written by Jordi TORRES.AI

Professor at UPC Barcelona Tech & Barcelona Supercomputing Center. Research focuses on Supercomputing & Artificial Intelligence https://torres.ai @JordiTorresAI

No responses yet