Saad AsadFlexAttention: Bridging Performance and Flexibility in Transformer Attention MechanismsLaunched in early 2024, FlexAttention enables researchers to modify attention mechanisms without writing complex GPU kernels, while…6d ago
Venkat Ram RaoTraining a Mini(114M Parameter) Llama 3 like Model from Scratchaka “How to Train you LLM”Jul 211
InTowards AIbyArion DasAdvanced Attention Mechanisms — IIFlash Attention. You can refer to it’s predecessors here: KV cache, sliding window attention, MHA, MQA, uptraining, & GQA. These methods…Nov 13Nov 13
InTowards Data SciencebyAnish DubeyFlash attention(Fast and Memory-Efficient Exact Attention with IO-Awareness): A deep diveFlash attention is power optimization transformer attention mechanism that provides 15% efficiencyMay 29May 29
InMachine Learning InterviewbyPham An KhangThe Math Behind FlashAttention — Breaking Down Matrix OperationsIn the world of modern deep learning, one of the most powerful advancements is the attention mechanism. Specifically, scaled dot-product…Sep 22Sep 22
Saad AsadFlexAttention: Bridging Performance and Flexibility in Transformer Attention MechanismsLaunched in early 2024, FlexAttention enables researchers to modify attention mechanisms without writing complex GPU kernels, while…6d ago
Venkat Ram RaoTraining a Mini(114M Parameter) Llama 3 like Model from Scratchaka “How to Train you LLM”Jul 211
InTowards AIbyArion DasAdvanced Attention Mechanisms — IIFlash Attention. You can refer to it’s predecessors here: KV cache, sliding window attention, MHA, MQA, uptraining, & GQA. These methods…Nov 13
InTowards Data SciencebyAnish DubeyFlash attention(Fast and Memory-Efficient Exact Attention with IO-Awareness): A deep diveFlash attention is power optimization transformer attention mechanism that provides 15% efficiencyMay 29
InMachine Learning InterviewbyPham An KhangThe Math Behind FlashAttention — Breaking Down Matrix OperationsIn the world of modern deep learning, one of the most powerful advancements is the attention mechanism. Specifically, scaled dot-product…Sep 22
Knowledgator EngineeringMeet new encoder language models (LLM2Encoder)The evolution of language models has followed two major paths: generative models (such as GPT, Llama and T5) and discriminative models…Sep 102
Najeeb KhanFlashAttention — one, two, three!An Overview of Efficient Attention Mechanisms Powering LLMsSep 2
InTowards Data SciencebySzymon PaluchaRunning a SOTA 7B Parameter Embedding Model on a Single GPURunning Qwen2 on SageMakerAug 9