Matthew GuntoninTowards Data ScienceHow to Improve Model Quality Without Building Larger ModelsGoing into the Google DeepMind’s “Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters”5d ago45d ago4
Matthew GuntoninTowards Data ScienceLine-By-Line, Let’s Reproduce GPT-2: Section 3 — TrainingThis blog post will go line-by-line through the code in Section 3 of Andrej Karpathy’s “Let’s reproduce GPT-2 (124M)”Sep 3Sep 3
Matthew GuntoninTowards Data ScienceLine-By-Line, Let’s Reproduce GPT-2: Section 2 — Hardware OptimizationThis blog post will go line-by-line through the hardware optimizations in Section 2 of Andrej Karpathy’s “Let’s reproduce GPT-2 (124M)”Jul 31Jul 31
Matthew GuntoninTowards Data ScienceLine By Line, Let’s Reproduce GPT-2: Section 1This blog post will go line-by-line through the code in Section 1 of Andrej Karpathy’s “Let’s reproduce GPT-2 (124M)”Jul 23Jul 23
Matthew GuntoninTowards Data ScienceExploring Medusa and Multi-Token PredictionThis blog post will go into detail on the “MEDUSA: Simple LLM Inference Acceleration Framework with Multiple Decoding Heads” paperJul 10Jul 10
Matthew GuntoninTowards Data ScienceDiving Deep into AutoGen and Agentic FrameworksThis blog post will go into the details of the “AutoGen: Enabling Next-Gen LLM Applications via Multi-Agent Conversation” paperJun 28Jun 28
Matthew GuntoninTowards AIUnderstanding Mamba and Selective State Space Models (SSMs)This blog post will go in detail on the “Mamba: Linear-Time Sequence Modeling with Selective State Spaces” paperJun 241Jun 241
Matthew GuntoninTowards Data ScienceUnderstanding You Only Cache OnceThis blog post will go in detail on the “You Only Cache Once: Decoder-Decoder Architectures for Language Models” Paper and its findingsJun 41Jun 41
Matthew GuntoninTowards Data ScienceUnderstanding Low Rank Adaptation (LoRA) in Fine Tuning LLMsHow LoRA works to fine-tune LLMs, following the methodology set out in the “LoRA: Low-Rank Adaptation of Large Language Models” paperMay 241May 241
Matthew GuntoninTowards Data ScienceUnderstanding Long RoPE in LLMsThis blog post will go in detail about the new Long RoPE Methodology used to expand the context lengths LLMs can support without…May 156May 156