PinnedBenjamin MarieinTowards Data ScienceMistral 7B: Recipes for Fine-tuning and Quantization on Your ComputerCheap supervised fine-tuning with an impressive LLM·9 min read·Oct 26, 2023--4--4
PinnedBenjamin MarieinTowards Data ScienceRun Mixtral-8x7B on Consumer Hardware with Expert OffloadingFinding the right trade-off between memory usage and inference speed·8 min read·Jan 11, 2024--3--3
Benjamin MarieNematron 4 340B: A Huge LLM by NVIDIAAnd probably still very good after low-bit quantization·2 min read·14 hours ago----
Benjamin MarieSamba: Better than Transformer and with Unlimited Context?After Jamba and Zamba, we now have Samba!·2 min read·5 days ago--1--1
Benjamin MarieGoogle’s RecurrentGemma 9B: Larger and Faster than Gemma 7BBased on the Griffin archit·2 min read·6 days ago----
Benjamin MarieMore Robust Preference Optimization for LLMsWith self-improving iterations·2 min read·Jun 12, 2024----
Benjamin MarieinTowards Data ScienceFine-Tune Tiny Adapters for Llama 3 with VeRALoRA but 100x smaller·6 min read·Jun 11, 2024--1--1
Benjamin MarieZamba: A New LLM Architecture with State Space Model Layers Sharing Self-AttentionFaster but as good as the transformer·2 min read·Jun 4, 2024----
Benjamin MarieWith SimPO You Don’t Need a Reference Model to Align Your LLMA Method Simpler than DPO for Preference Optimization·2 min read·Jun 2, 2024----
Benjamin MarieMemory-Efficient Inference: Smaller KV Cache with Cross-Layer AttentionSharing KV activations across layers.·2 min read·May 28, 2024----