1kgQuantization : What Is LLM Quantization ?Quantization is a pivotal technique in artificial intelligence that reduces the computational and memory demands of large language models…20h ago1
Ingrid StevensQuantization of LLMs with llama.cppUnderstanding and Implementing n-bit Quantization Techniques for Efficient Inference in LLMsMar 159
Arun NandainTowards Data ScienceReducing the Size of AI ModelsThis introductory article gives an overview of different approaches to reduce model size. It introduces quantization as the most…Sep 74Sep 74
Max NgSmall Model, Big Performance: Matching LLMs with Models 3x the Transformer LayersTL;DR: By looping through transformer layers multiple times with residual connections, our approach enables smaller language models (LLMs)…Nov 3Nov 3
Pierre LienhartinTowards Data ScienceThe AQLM Quantization Algorithm, ExplainedIn this blog post, we cover the AQLM quantization algorithm which sets a new state-of-the-art for compressing LLMs down to 2 bits!Mar 132Mar 132
1kgQuantization : What Is LLM Quantization ?Quantization is a pivotal technique in artificial intelligence that reduces the computational and memory demands of large language models…20h ago1
Ingrid StevensQuantization of LLMs with llama.cppUnderstanding and Implementing n-bit Quantization Techniques for Efficient Inference in LLMsMar 159
Arun NandainTowards Data ScienceReducing the Size of AI ModelsThis introductory article gives an overview of different approaches to reduce model size. It introduces quantization as the most…Sep 74
Max NgSmall Model, Big Performance: Matching LLMs with Models 3x the Transformer LayersTL;DR: By looping through transformer layers multiple times with residual connections, our approach enables smaller language models (LLMs)…Nov 3
Pierre LienhartinTowards Data ScienceThe AQLM Quantization Algorithm, ExplainedIn this blog post, we cover the AQLM quantization algorithm which sets a new state-of-the-art for compressing LLMs down to 2 bits!Mar 132
Arun NandainTowards AIUnderstanding 1.58-bit Large Language ModelsQuantizing LLMs to ternary, using 1.58-bits, instead of binary, achieves performance gains, possibly exceeding full-precision 32-bit…Sep 72
Maaz Bin KhalidQuantization: An Easy IntroductionIn this article, we’ll dive into absmax quantization applied to the GPT-2 model, one of the simplest quantization methods to understand. As…Nov 3
Dhruv MataniinTowards Data ScienceTensor Quantization: The Untold StoryA close look at the implementation details of quantization in machine learning frameworksSep 8, 20233