PinnedBenjamin MarieinTowards Data ScienceMistral 7B: Recipes for Fine-tuning and Quantization on Your ComputerCheap supervised fine-tuning with an impressive LLMOct 26, 20234Oct 26, 20234
PinnedBenjamin MarieinTowards Data ScienceRun Mixtral-8x7B on Consumer Hardware with Expert OffloadingFinding the right trade-off between memory usage and inference speedJan 113Jan 113
Benjamin MarieNematron 4 340B: A Huge LLM by NVIDIAAnd probably still very good after low-bit quantization7h ago7h ago
Benjamin MarieSamba: Better than Transformer and with Unlimited Context?After Jamba and Zamba, we now have Samba!5d ago15d ago1
Benjamin MarieGoogle’s RecurrentGemma 9B: Larger and Faster than Gemma 7BBased on the Griffin archit6d ago6d ago
Benjamin MarieinTowards Data ScienceFine-Tune Tiny Adapters for Llama 3 with VeRALoRA but 100x smallerJun 111Jun 111
Benjamin MarieZamba: A New LLM Architecture with State Space Model Layers Sharing Self-AttentionFaster but as good as the transformerJun 4Jun 4
Benjamin MarieWith SimPO You Don’t Need a Reference Model to Align Your LLMA Method Simpler than DPO for Preference OptimizationJun 2Jun 2
Benjamin MarieMemory-Efficient Inference: Smaller KV Cache with Cross-Layer AttentionSharing KV activations across layers.May 28May 28