HuggingFace
Published in

HuggingFace

By David Marcu

💥 Training Neural Nets on Larger Batches: Practical Tips for 1-GPU, Multi-GPU & Distributed setups

⌛️Large batches on one or several GPU(s)

Adam confirms your predicament! 😱Oh no!

How can you do that?

The 5-steps of a gradient descent optimization algorithm

😱 Pushing that to the extreme

A “Memory-poor” strategy that needs O(1) memory (but requires O(n²) computation steps) — From Yaroslav Bulatov’s nice post: https://medium.com/tensorflow/fitting-larger-networks-into-memory-583e3c758ff9

🕰 Making the best of a multi-GPU machine

Under some settings GPU-1 will be used a lot more than the other GPUs.

Forward and Backward passes with torch.nn.DataParallel
Number of elements in the output of a language model

⚖️ Balanced load on a multi-GPU machine

Using DataParallelModel and DataParallelCriterion

⏰ Distributed training: training on several machines

Models that make heavy use of Python loops/call in their forward passes can be slowed down by the python interpreter’s GIL when several parallel forward calls are driven by a single interpreter. In these settings, DistributedDataParallel can advantageously replace DataParallel even on a single-machine setup.

The main server (server 1) has an accessible IP and an open port for communication.

To run our script, we’ll use the torch.distributed.launch utility of PyTorch. It will take care of setting the environment variables and call each script with the right local_rank argument.

--

--

--

Stories @ Hugging Face

Recommended from Medium

TD Learning — Solving the evaluation problem

Everything you need to know about bias and variance

Lumen is UE5’s GI system and supports both hardware-based RTX and software-based Trace algorithms.

Building a Natural Language Processing API with FastAPI, SAS Viya, and Docker

Mathematics for ML (machine learning )

Summarizing The Great Gatsby using Natural Language Processing

A Deep Dive into Regularization

Scalable Time-Series Forecasting in Spark — Prophet, CNN, LSTM, and SARIMA

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Thomas Wolf

Thomas Wolf

Natural Language Processing, Deep learning and Computational Linguistics – Science Lead @Huggingface | thomwolf.io

More from Medium

Getting Started with Spell Workspaces and IPUs

How to Build a Knowledge Graph with Neo4J and Transformers

Accelerating Question Answering Applications with MobileBERT and TFLite

3 techniques to perform Named Entity Recognition (NER) on Text Data