Yuanzhe DongUsing CUDA Graph in PytorchCUDA Graph is a feature to reduce training time. Instead of launching kernels one by one with all the CPU launching overheads for each…Dec 27, 2022Dec 27, 2022
Yuanzhe DongDump Pytorch tensors to diskSometimes we need to dump output from intermediate layers to disk and in order to use it to debug. Usually there’re two ways to do it, one…Jul 14, 2022Jul 14, 2022
Yuanzhe DongProfile Pytorch code using nsys and nsight step by stepHere I’m trying to demonstrate how to profile and trace PyTorch code activities on GPUs using nsys and nsight step by step, assuming we…Jul 7, 2022Jul 7, 2022