Member-only story

Making Sense of Big Data

Deep Learning on Supercomputers

Hands-on about how to scale a Deep Learning application in the BSC’s CTE-Power cluster

Jordi TORRES.AI
TDS Archive
Published in
22 min readFeb 2, 2021

--

(Image from bsc.es)

This post will be used as documentation in the PATC course Introduction to Big Data Analytics at BSC

In a previous post, we demonstrated that supercomputers are a key component of the progress of Artificial Intelligence and what drove changes in effective compute over the last years was the increased parallelization and distribution of the algorithms.

This post will demonstrate how these supercomputers can be used; specifically, the BSC’s CTE-POWER cluster, in that each server has two CPUs IBM Power9 and four NVIDIA V100 GPUs.

In this series of posts, we will use the TensorFlow framework; however, the code in PyTorch code doesn’t differ too much. We will use the Keras API because since the release of Tensorflow 2.0, tf.keras.Model API has become the primary way of building neural networks, particularly those not requiring custom training loops.

1 — BSC’s CTE-POWER Cluster

1.1 System Overview

--

--

TDS Archive
TDS Archive

Published in TDS Archive

An archive of data science, data analytics, data engineering, machine learning, and artificial intelligence writing from the former Towards Data Science Medium publication.

Jordi TORRES.AI
Jordi TORRES.AI

Written by Jordi TORRES.AI

Professor at UPC Barcelona Tech & Barcelona Supercomputing Center. Research focuses on Supercomputing & Artificial Intelligence https://torres.ai @JordiTorresAI

Responses (1)