Published inTDS ArchiveHow Bend Works: A Parallel Programming Language That “Feels Like Python but Scales Like CUDA”A brief introduction to Lambda Calculus, Interaction Combinators, and how they are used to parallelize operations on Bend / HVM.Jun 26, 202410Jun 26, 202410
Published inTDS ArchiveRecreating PyTorch from scratch (with GPU support and automatic differentiation)Build your own deep learning framework based on C/C++, CUDA and Python, with GPU support and automatic differentiation!May 14, 202419May 14, 202419
Published inTDS ArchiveWhy Deep Learning Models Run Faster on GPUs: A Brief Introduction to CUDA ProgrammingFor those who want to understand what .to(“cuda”) does.Apr 17, 202419Apr 17, 202419
Scaling Deep Learning Models in Production for millions of usersFor those who want to go beyond Flask+HerokuJul 22, 20231Jul 22, 20231
How to run distributed multinode training in practiceTutorial for multinode training using PyTorch, Slurm and AWSMay 17, 20231May 17, 20231