Zain ul AbideenMHA vs MQA vs GQA vs MLAComparison of Deepseek’s new Multi-latent head attention with MHA, MQA, and GQA.1d ago1d ago
Zain ul AbideenLinear Rope vs NTK vs YaRN vs CoPEComparison of various positional embeddings.1d ago1d ago
Zain ul AbideenAlign Phi3 with CPO-SimPOAlign your LLM with less memory and speed efficient approach than DPO.Jul 6Jul 6
Zain ul AbideenBest LLM Inference Engine? TensorRT vs vLLM vs LMDeploy vs MLC-LLMBenchmarking various LLM Inference Engines.Jul 6Jul 6
Zain ul AbideenMoE vs Dense vs Hybrid LLM ArchitecturesTrain 600M MoE, Dense, Hybrid LLM Architectures.Apr 292Apr 292
Zain ul AbideenSchedule-Free Learning — A New Way to Train ModelsTraining 3 Llama models for comparison of Cosine Scheduled and Schedule-Free optimizer.Apr 18Apr 18
Zain ul AbideenLlama-Bitnet | Training a 1.58 bit LLMWhat is 1 bit LLM and How to train 70M Llama-Bitnet?Apr 42Apr 42
Zain ul AbideenORPO Outperforms SFT+DPO | Train Phi-2 with ORPOTrain Phi-2 with ORPO with LazyOrpoMar 223Mar 223
Zain ul AbideenMulti-GPU Training of 70B LLM with Deepspeed and FSDP+QloraTrain 70–120B LLM on 4xA100s and 2xRTX3090s (Consumer-grade GPUs)Mar 141Mar 141
Zain ul AbideenWeekly AI News | The Latest AI Updates| 3 Mar— 10 MarA quick dive into recent Generative-AI research, analyzing AI in business, and learn about this week’s recent AI tools.Mar 11Mar 11