Benjamin Marie – Medium

Pinned

Benjamin Marie
in
Towards Data Science

Mistral 7B: Recipes for Fine-tuning and Quantization on Your Computer

Cheap supervised fine-tuning with an impressive LLM

9 min readOct 26, 2023

--

4

Mistral 7B: Recipes for Fine-tuning and Quantization on Your Computer

--

4

Pinned

Benjamin Marie
in
Towards Data Science

Run Mixtral-8x7B on Consumer Hardware with Expert Offloading

Finding the right trade-off between memory usage and inference speed

8 min readJan 11, 2024

--

3

Run Mixtral-8x7B on Consumer Hardware with Expert Offloading

--

3

Benjamin Marie

Zamba: A New LLM Architecture with State Space Model Layers Sharing Self-Attention

Faster but as good as the transformer

2 min read7 hours ago

--

Zamba: A New LLM Architecture with State Space Model Layers Sharing Self-Attention

--

Benjamin Marie

With SimPO You Don’t Need a Reference Model to Align Your LLM

A Method Simpler than DPO for Preference Optimization

2 min read2 days ago

--

With SimPO You Don’t Need a Reference Model to Align Your LLM

--

Benjamin Marie

Memory-Efficient Inference: Smaller KV Cache with Cross-Layer Attention

Sharing KV activations across layers.

2 min readMay 28, 2024

--

Memory-Efficient Inference: Smaller KV Cache with Cross-Layer Attention

--

Benjamin Marie
in
Towards Data Science

Quantize Llama 3 8B with Bitsandbytes to Preserve Its Accuracy

Llama 2 vs. Llama 3 vs. Mistral 7B, quantized with GPTQ and Bitsandbytes

6 min readMay 27, 2024

--

Quantize Llama 3 8B with Bitsandbytes to Preserve Its Accuracy

--

Benjamin Marie

Yi-1.5: Better than Llama 3 8B?

500B tokens can make a big difference

2 min readMay 26, 2024

--

1

Yi-1.5: Better than Llama 3 8B?

--

1

Benjamin Marie

Piccolo2: Multitask Hybrid Training for Text Embeddings

Exploiting datasets from different types of tasks for training better text embeddings

2 min readMay 24, 2024

--

Piccolo2: Multitask Hybrid Training for Text Embeddings

--

Benjamin Marie

SUPRA: Turn a Transformer Model into an RNN Model

But it’s not cheap

2 min readMay 20, 2024

--

SUPRA: Turn a Transformer Model into an RNN Model

--

Benjamin Marie

Sparse Llama: 70% Smaller, 3x Faster, and Full Accuracy

Pruning and short pre-training

2 min readMay 17, 2024

--

1

Sparse Llama: 70% Smaller, 3x Faster, and Full Accuracy

--

1

Benjamin Marie

Benjamin Marie

Ph.D, research scientist in NLP/AI. Medium "Top writer" in AI and Technology. Exclusive articles and all my AI notebooks on https://kaitchup.substack.com/

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams