Anthony PenginPolo Club of Data Science | Georgia TechHow to measure inter-GPU connection speed (single node)?Key GPU Knowledge for ML Researchers SeriesOct 12, 2023Oct 12, 2023
Anthony PenginPolo Club of Data Science | Georgia TechMulti-GPU Training in PyTorch with Code (Part 4): TorchrunWe discussed single-GPU training in Part 1, multi-GPU training with DP in Part 2, and multi-GPU training with DDP in Part 3. We will…Jul 7, 2023Jul 7, 2023
Anthony PenginPolo Club of Data Science | Georgia TechMulti-GPU Training in PyTorch with Code (Part 3): Distributed Data ParallelWe discussed single-GPU training in Part 1 and multi-GPU training with DP in Part 2. In Part 2, we found DP is incompatible with GPUs w/o…Jul 7, 2023Jul 7, 2023
Anthony PenginPolo Club of Data Science | Georgia TechMulti-GPU Training in PyTorch with Code (Part 2): Data ParallelIn Part 1, we successfully trained a ResNet34 on CIFAR10 using a single GPU. In this article, we will explore how to launch the training on…Jul 7, 20231Jul 7, 20231
Anthony PenginPolo Club of Data Science | Georgia TechMulti-GPU Training in PyTorch with Code (Part 1): Single GPU ExampleThis tutorial series will cover how to launch your deep learning training on multiple GPUs in PyTorch. We will discuss how to extrapolate a…Jul 7, 2023Jul 7, 2023