Max ShapGrouped Query Attention (GQA) explained with codeIn this short article, I will explain the idea behind GQA and how to translate it into code.Jan 243Jan 243
Max ShapUnderstanding and Estimating GPU Memory Demands for Training LLMs in practiceUnderstand how much GPU memory per device you would need to train yet another LLM.Jan 66Jan 66