PinnedBenjamin MarieinTowards Data ScienceMistral 7B: Recipes for Fine-tuning and Quantization on Your ComputerCheap supervised fine-tuning with an impressive LLMOct 26, 20234Oct 26, 20234
PinnedBenjamin MarieinTowards Data ScienceRun Mixtral-8x7B on Consumer Hardware with Expert OffloadingFinding the right trade-off between memory usage and inference speedJan 113Jan 113
Benjamin MarieinTowards Data ScienceMulti-GPU Fine-tuning for Llama 3.1 70B with FSDP and QLoRAWhat you can do with only 2x24 GB GPUs and a lot of CPU RAM1d ago1d ago
Benjamin MarieThinK: KV Cache Pruning for Memory Efficient InferenceA promising approach if combined with KV cache quantization1d ago1d ago
Benjamin MarieinTowards Data ScienceServe Multiple LoRA Adapters with vLLMWithout any increase in latency5d ago5d ago
Benjamin MarieMore Evidence that Ternary LLMs Are Good Enough-1, 0, and 1 are all you need to make good LLMsJul 25Jul 25
Benjamin MarieinTowards Data ScienceFunction Calling: Fine-Tuning Llama 3 on xLAMFast and memory-efficient thanks to QLoRAJul 231Jul 231
Benjamin MarieQ-GaLore: Train LLMs from Scratch with a 16 GB GPUGaLore but with quantizationJul 21Jul 21
Benjamin MarieData Contamination for LLM Code Benchmarking, Can We Avoid It?Probably not, but we can try.Jul 191Jul 191
Benjamin MarieFine-tune Gemma 2 on Your Computer with LoRA and QLoRAUsing Hugging Face libraries and UnslothJul 161Jul 161