Zain ul AbideenQ-GaLore | Memory-efficient Pre-training and Fine-tuningTraining or fine-tuning Large Language Models (LLMs) demands high-end GPUs due to massive datasets, optimizer states, and Billion…Jul 20Jul 20
Zain ul AbideenCoding Deepseek-V2 from Scratch in PyTorchImplementation of Multi-head Latent Attention, Fine-Grained Expert Segmentation, and Shared Expert Isolation.Jul 201Jul 201
Zain ul AbideenMHA vs MQA vs GQA vs MLAComparison of Deepseek’s new Multi-latent head attention with MHA, MQA, and GQA.Jul 131Jul 131
Zain ul AbideenLinear Rope vs NTK vs YaRN vs CoPEComparison of various positional embeddings.Jul 13Jul 13
Zain ul AbideenAlign Phi3 with CPO-SimPOAlign your LLM with less memory and speed efficient approach than DPO.Jul 6Jul 6
Zain ul AbideenBest LLM Inference Engine? TensorRT vs vLLM vs LMDeploy vs MLC-LLMBenchmarking various LLM Inference Engines.Jul 61Jul 61
Zain ul AbideenMoE vs Dense vs Hybrid LLM ArchitecturesTrain 600M MoE, Dense, Hybrid LLM Architectures.Apr 292Apr 292
Zain ul AbideenSchedule-Free Learning — A New Way to Train ModelsTraining 3 Llama models for comparison of Cosine Scheduled and Schedule-Free optimizer.Apr 18Apr 18
Zain ul AbideenLlama-Bitnet | Training a 1.58 bit LLMWhat is 1 bit LLM and How to train 70M Llama-Bitnet?Apr 42Apr 42
Zain ul AbideenORPO Outperforms SFT+DPO | Train Phi-2 with ORPOTrain Phi-2 with ORPO with LazyOrpoMar 223Mar 223