InTowards AIbyDon LimThe Law of Scale Is Invalid: Comparing Brain vs. AI OptimizationThe total number of parameters of GPT-4 is about 10 times larger than GPT-3. However, the actual parameters used for any given prompt by…Dec 31
Chinmay DeshpandeBoosting LLM Inference Speed: High Performance, Zero CompromiseExplore key techniques to boost LLM inference speed: parallelization, vectorization, loop tiling, operator fusion, and quantization.Mar 241
InGoogle Cloud - CommunitybyKartik ChaudharyTurbocharge Large Language Model Training with ParallelizationParallelization techniques for efficient distributed training of large deep learning models.Nov 28Nov 28
Yash Jain# Speeding up picoGPT — Implementing Speculative DecodingA few days ago, I came across Karpathy’s minimal GPT implementation in NumPy (picoGPT). While exploring various optimization techniques…Nov 17Nov 17
InTowards AIbyDon LimLLM Optimization Techniques: Quantization, Distillation, Pruning, Parameter & Layer ReductionWhat are the relationships between GPT-4o to GPT-4o mini and Gemini to Gemini Flash and Claude Opus to Claude Sonnet to Claude Haiku? The…Nov 26Nov 26
InTowards AIbyDon LimThe Law of Scale Is Invalid: Comparing Brain vs. AI OptimizationThe total number of parameters of GPT-4 is about 10 times larger than GPT-3. However, the actual parameters used for any given prompt by…Dec 31
Chinmay DeshpandeBoosting LLM Inference Speed: High Performance, Zero CompromiseExplore key techniques to boost LLM inference speed: parallelization, vectorization, loop tiling, operator fusion, and quantization.Mar 241
InGoogle Cloud - CommunitybyKartik ChaudharyTurbocharge Large Language Model Training with ParallelizationParallelization techniques for efficient distributed training of large deep learning models.Nov 28
Yash Jain# Speeding up picoGPT — Implementing Speculative DecodingA few days ago, I came across Karpathy’s minimal GPT implementation in NumPy (picoGPT). While exploring various optimization techniques…Nov 17
InTowards AIbyDon LimLLM Optimization Techniques: Quantization, Distillation, Pruning, Parameter & Layer ReductionWhat are the relationships between GPT-4o to GPT-4o mini and Gemini to Gemini Flash and Claude Opus to Claude Sonnet to Claude Haiku? The…Nov 26
Yash BhaskarBreaking it Down: A Comprehensive Guide to Text Splitting for LLM PerformanceOne of the most powerful strategies to elevate your language model performance is text splitting. It might sound basic, but trust me, it’s…Nov 2
InAI Mindbyshashank JainSpeculative Decoding in LLMsBefore diving into speculative decoding, it's important to understand the concept of decoding in NLP. Decoding refers to the process by…Oct 24
Arash NicoomaneshOptimizing Transformers: Do We Really Need All That Attention?“What Matters In Transformers? NOT ALL ATTENTION IS NEEDED” is an interesting paper that finds you can actually remove half of the…Oct 23