The most insightful stories about Model Quantization

Model Quantization

Topic

20 Followers

139 Stories

Recommended stories

1kg
Quantization : What Is LLM Quantization ?
Quantization is a pivotal technique in artificial intelligence that reduces the computational and memory demands of large language models…
20h ago
1
Ingrid Stevens
Quantization of LLMs with llama.cpp
Understanding and Implementing n-bit Quantization Techniques for Efficient Inference in LLMs
Mar 15
9
Arun Nanda
in
Towards Data Science
Reducing the Size of AI ModelsThis introductory article gives an overview of different approaches to reduce model size. It introduces quantization as the most…
Sep 7
4
Sep 7
4
Max Ng
Small Model, Big Performance: Matching LLMs with Models 3x the Transformer LayersTL;DR: By looping through transformer layers multiple times with residual connections, our approach enables smaller language models (LLMs)…
Nov 3
Nov 3
Pierre Lienhart
in
Towards Data Science
The AQLM Quantization Algorithm, ExplainedIn this blog post, we cover the AQLM quantization algorithm which sets a new state-of-the-art for compressing LLMs down to 2 bits!
Mar 13
2
Mar 13
2

Quantization : What Is LLM Quantization ?

1kg

Quantization : What Is LLM Quantization ?

Quantization is a pivotal technique in artificial intelligence that reduces the computational and memory demands of large language models…

20h ago

Ingrid Stevens

Quantization of LLMs with llama.cpp

Understanding and Implementing n-bit Quantization Techniques for Efficient Inference in LLMs

Mar 15

Arun Nanda
in
Towards Data Science

Reducing the Size of AI Models

This introductory article gives an overview of different approaches to reduce model size. It introduces quantization as the most…

Sep 7

Small Model, Big Performance: Matching LLMs with Models 3x the Transformer Layers

Max Ng

Small Model, Big Performance: Matching LLMs with Models 3x the Transformer Layers

TL;DR: By looping through transformer layers multiple times with residual connections, our approach enables smaller language models (LLMs)…

Nov 3

Pierre Lienhart
in
Towards Data Science

The AQLM Quantization Algorithm, Explained

In this blog post, we cover the AQLM quantization algorithm which sets a new state-of-the-art for compressing LLMs down to 2 bits!

Mar 13

Arun Nanda
in
Towards AI

Understanding 1.58-bit Large Language Models

Quantizing LLMs to ternary, using 1.58-bits, instead of binary, achieves performance gains, possibly exceeding full-precision 32-bit…

Sep 7

Maaz Bin Khalid

Quantization: An Easy Introduction

In this article, we’ll dive into absmax quantization applied to the GPT-2 model, one of the simplest quantization methods to understand. As…

Nov 3

Dhruv Matani
in
Towards Data Science

Tensor Quantization: The Untold Story

A close look at the implementation details of quantization in machine learning frameworks

Sep 8, 2023

See more recommended stories