Benjamin Marie – Medium

Benjamin Marie

Pinned

Benjamin Marie
in
Towards Data Science

Run Mixtral-8x7B on Consumer Hardware with Expert Offloading

Finding the right trade-off between memory usage and inference speed

Jan 11

Run Mixtral-8x7B on Consumer Hardware with Expert Offloading

Jan 11

Benjamin Marie
in
Stackademic

FLUTE: Faster QLoRA Fine-tuning with NF4 Models

Finally, NF4 models have a reasonable latency

2d ago

FLUTE: Faster QLoRA Fine-tuning with NF4 Models

2d ago

Benjamin Marie

AdEMAMix: Achieve the Same Results as with AdamW Using Only Half as Many Training Tokens

With two momentum terms

2d ago

AdEMAMix: Achieve the Same Results as with AdamW Using Only Half as Many Training Tokens

2d ago

Benjamin Marie
in
Towards Data Science

GGUF Quantization with Imatrix and K-Quantization to Run LLMs on Your CPU

Fast and accurate GGUF models for your CPU

Sep 13

GGUF Quantization with Imatrix and K-Quantization to Run LLMs on Your CPU

Sep 13

Benjamin Marie

Better Prioritize LLM Tasks for Higher System Throughput

How to replace the naive “first-come-first-serve” rule

Sep 5

Better Prioritize LLM Tasks for Higher System Throughput

Sep 5

Benjamin Marie
in
Stackademic

Enhanced SSM Training Through Initialization with a Pre-trained Transformer

The Mamba in the Llama

Sep 4

Enhanced SSM Training Through Initialization with a Pre-trained Transformer

Sep 4

Benjamin Marie

Zamba2–1.2B: A Smaller Hybrid SSM/Transformer

Very fast and memory-efficient inference

Sep 3

Zamba2–1.2B: A Smaller Hybrid SSM/Transformer

Sep 3

Benjamin Marie

NanoFlow: Faster than vLLM and TensorRT-LLM

1.91x higher throughput

Sep 1

NanoFlow: Faster than vLLM and TensorRT-LLM

Sep 1

Benjamin Marie

End-to-end FP8 Pre-training for LLMs

Stable and memory-efficient

Aug 30

End-to-end FP8 Pre-training for LLMs

Aug 30

Benjamin Marie
in
Stackademic

Jamba 1.5: Two New Hybrid Transformers/SSM of 52B and 398B Parameters

Huge but very efficient, especially for long-context processing

Aug 29

Jamba 1.5: Two New Hybrid Transformers/SSM of 52B and 398B Parameters

Aug 29

Benjamin Marie

Benjamin Marie

Ph.D, research scientist in NLP/AI. Medium "Top writer" in AI and Technology. Exclusive articles and all my AI notebooks on https://kaitchup.substack.com/

Following

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams