Andrei Apostol – Medium

Andrei Apostol

Andrei Apostol
in
MantisNLP

Knowledge Distillation — Techniques for Efficient Inference of LLMs (IV/IV)

Welcome to the fourth and final part of our blog series on techniques for efficient inference of LLMs. In this segment, we explore…

Nov 8, 2023

Knowledge Distillation — Techniques for Efficient Inference of LLMs (IV/IV)

Nov 8, 2023

Andrei Apostol
in
MantisNLP

FlashAttention — Techniques for Efficient Inference of LLMs (III/IV)

Last time on this series we discussed about pruning (removing useless weights in a network) and paged attention (optimizing memory access)…

Nov 1, 2023

FlashAttention — Techniques for Efficient Inference of LLMs (III/IV)

Nov 1, 2023

Andrei Apostol
in
MantisNLP

Techniques for Efficient Inference of LLMs (II/IV)

Last time we talked about quantization, a compression technique used to reduce the bitwidth of neural networks by representing the weights…

Oct 18, 2023

Techniques for Efficient Inference of LLMs (II/IV)

Oct 18, 2023

Andrei Apostol
in
MantisNLP

Techniques for Efficient Inference of LLMs (I/IV)

Introduction

Oct 4, 2023

Techniques for Efficient Inference of LLMs (I/IV)

Oct 4, 2023

Andrei Apostol

Andrei Apostol

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams