PinnedBenjamin MarieinTowards Data ScienceMistral 7B: Recipes for Fine-tuning and Quantization on Your ComputerCheap supervised fine-tuning with an impressive LLM·9 min read·Oct 26, 2023--4--4
PinnedBenjamin MarieinTowards Data ScienceRun Mixtral-8x7B on Consumer Hardware with Expert OffloadingFinding the right trade-off between memory usage and inference speed·8 min read·Jan 11, 2024--3--3
Benjamin MarieZamba: A New LLM Architecture with State Space Model Layers Sharing Self-AttentionFaster but as good as the transformer·2 min read·7 hours ago----
Benjamin MarieWith SimPO You Don’t Need a Reference Model to Align Your LLMA Method Simpler than DPO for Preference Optimization·2 min read·2 days ago----
Benjamin MarieMemory-Efficient Inference: Smaller KV Cache with Cross-Layer AttentionSharing KV activations across layers.·2 min read·May 28, 2024----
Benjamin MarieinTowards Data ScienceQuantize Llama 3 8B with Bitsandbytes to Preserve Its AccuracyLlama 2 vs. Llama 3 vs. Mistral 7B, quantized with GPTQ and Bitsandbytes·6 min read·May 27, 2024----
Benjamin MarieYi-1.5: Better than Llama 3 8B?500B tokens can make a big difference·2 min read·May 26, 2024--1--1
Benjamin MariePiccolo2: Multitask Hybrid Training for Text EmbeddingsExploiting datasets from different types of tasks for training better text embeddings·2 min read·May 24, 2024----
Benjamin MarieSUPRA: Turn a Transformer Model into an RNN ModelBut it’s not cheap·2 min read·May 20, 2024----
Benjamin MarieSparse Llama: 70% Smaller, 3x Faster, and Full AccuracyPruning and short pre-training·2 min read·May 17, 2024--1--1