Pierre Lienhart – Medium

Pierre Lienhart

Pierre Lienhart
in
Towards Data Science

The AQLM Quantization Algorithm, Explained

In this blog post, we cover the AQLM quantization algorithm which sets a new state-of-the-art for compressing LLMs down to 2 bits!

Mar 13

The AQLM Quantization Algorithm, Explained

Mar 13

Pierre Lienhart

LLM Inference Series: 5. Dissecting model performance

In this post, we look deeper into the different types of bottleneck that affect model latency and explain what arithmetic intensity is.

Feb 2

LLM Inference Series: 5. Dissecting model performance

Feb 2

Pierre Lienhart

LLM Inference Series: 4. KV caching, a deeper look

In this post, we will look at how big the KV cache, a common optimization for LLM inference, can grow and at common mitigation strategies.

Jan 15

LLM Inference Series: 4. KV caching, a deeper look

Jan 15

Pierre Lienhart

LLM Inference Series: 3. KV caching unveiled

In this post we introduce the KV caching optimization for LLM inference, where does it come from and what does it change.

Dec 22, 2023

LLM Inference Series: 3. KV caching unveiled

Dec 22, 2023

Pierre Lienhart

LLM Inference Series: 2. The two-phase process behind LLMs’ responses

After a quick reminder on the Transformer architecture, this post covers the algorithm of text generation using Transformer decoder models.

Dec 22, 2023

LLM Inference Series: 2. The two-phase process behind LLMs’ responses

Dec 22, 2023

Pierre Lienhart

LLM Inference Series: 1. Introduction

In this post, I introduce the outline of this deep dive series about the specifics and challenges of hosting LLMs for inference.

Dec 22, 2023

Dec 22, 2023

Pierre Lienhart

Pierre Lienhart

GenAI solution architect @AWS - Opinions and errors are my own.

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams