Building a Coding agent to solve SWE-BenchIn our first attempt to solve SWE-bench problems, we ran into a lot of issues because the patches were being created before the actual…Jan 17Jan 17
Introduction to SWE Bench & Patch Centric ApproachThe Software Engineering (SWE) Bench was created to evaluate AI coding agents like Devin, which automate tasks such as bug fixes and code…Jan 17Jan 17
Q-GaLore | Memory-efficient Pre-training and Fine-tuningTraining or fine-tuning Large Language Models (LLMs) demands high-end GPUs due to massive datasets, optimizer states, and Billion…Jul 20, 20241Jul 20, 20241
Coding Deepseek-V2 from Scratch in PyTorchImplementation of Multi-head Latent Attention, Fine-Grained Expert Segmentation, and Shared Expert Isolation.Jul 20, 20241Jul 20, 20241
MHA vs MQA vs GQA vs MLAComparison of Deepseek’s new Multi-latent head attention with MHA, MQA, and GQA.Jul 13, 20242Jul 13, 20242
Linear Rope vs NTK vs YaRN vs CoPEComparison of various positional embeddings.Jul 13, 2024Jul 13, 2024
Align Phi3 with CPO-SimPOAlign your LLM with less memory and speed efficient approach than DPO.Jul 6, 20241Jul 6, 20241
Best LLM Inference Engine? TensorRT vs vLLM vs LMDeploy vs MLC-LLMBenchmarking various LLM Inference Engines.Jul 6, 20241Jul 6, 20241
MoE vs Dense vs Hybrid LLM ArchitecturesTrain 600M MoE, Dense, Hybrid LLM Architectures.Apr 29, 20242Apr 29, 20242
Schedule-Free Learning — A New Way to Train ModelsTraining 3 Llama models for comparison of Cosine Scheduled and Schedule-Free optimizer.Apr 18, 2024Apr 18, 2024