PinnedBenjamin MarieinTowards Data ScienceMistral 7B: Recipes for Fine-tuning and Quantization on Your ComputerCheap supervised fine-tuning with an impressive LLMOct 26, 20234Oct 26, 20234
PinnedBenjamin MarieinTowards Data ScienceRun Mixtral-8x7B on Consumer Hardware with Expert OffloadingFinding the right trade-off between memory usage and inference speedJan 113Jan 113
Benjamin MarieLocal-Gemma: Memory-efficient Inference with Gemma 2A very simple framework3d ago3d ago
Benjamin MarieinTowards Data ScienceAutoRound: Accurate Low-bit Quantization for LLMsBetween quantization-aware training and post-training quantizationJun 29Jun 29
Benjamin MarieBinaryMoS: Better Binary LLMs with Mixture of ScalesToken-Adaptive Binarization for LLMsJun 25Jun 25
Benjamin MarieTurbo Sparse: LLMs with Minimal Activated ParametersIncreasing the average sparsity of the FFN to 90%Jun 24Jun 24
Benjamin MarieinStackademicFaster LLMs without MatMul OperationsA new efficient alternative to transformersJun 24Jun 24
Benjamin MarieNematron 4 340B: A Huge LLM by NVIDIAAnd probably still very good after low-bit quantizationJun 23Jun 23