Published inTowards AICan LLMs Truly Think Outside the Box?GPT-4o Struggles with Brain Teasers.Jan 14Jan 14
Published inTDS ArchiveHow Long Does It Take to Train the LLM From Scratch?Guide to estimating time for training X-billion LLMs with Y trillion tokens and Z GPU computeOct 28, 20244Oct 28, 20244
Grouped Query Attention (GQA) explained with codeIn this short article, I will explain the idea behind GQA and how to translate it into code.Jan 24, 20244Jan 24, 20244
Understanding and Estimating GPU Memory Demands for Training LLMs in practiceUnderstand how much GPU memory per device you would need to train yet another LLM.Jan 6, 20247Jan 6, 20247