The most insightful stories about Llm Optimization

Llm Optimization

Topic

4 Followers

20 Stories

Recommended stories

In
Towards AI
by
Don Lim
The Law of Scale Is Invalid: Comparing Brain vs. AI Optimization
The total number of parameters of GPT-4 is about 10 times larger than GPT-3. However, the actual parameters used for any given prompt by…
Dec 3
1
Chinmay Deshpande
Boosting LLM Inference Speed: High Performance, Zero Compromise
Explore key techniques to boost LLM inference speed: parallelization, vectorization, loop tiling, operator fusion, and quantization.
Mar 24
1
In
Google Cloud - Community
by
Kartik Chaudhary
Turbocharge Large Language Model Training with ParallelizationParallelization techniques for efficient distributed training of large deep learning models.
Nov 28
Nov 28
Yash Jain
# Speeding up picoGPT — Implementing Speculative DecodingA few days ago, I came across Karpathy’s minimal GPT implementation in NumPy (picoGPT). While exploring various optimization techniques…
Nov 17
Nov 17
In
Towards AI
by
Don Lim
LLM Optimization Techniques: Quantization, Distillation, Pruning, Parameter & Layer ReductionWhat are the relationships between GPT-4o to GPT-4o mini and Gemini to Gemini Flash and Claude Opus to Claude Sonnet to Claude Haiku? The…
Nov 26
Nov 26

The Law of Scale Is Invalid: Comparing Brain vs. AI Optimization

Towards AI

Don Lim

The Law of Scale Is Invalid: Comparing Brain vs. AI Optimization

The total number of parameters of GPT-4 is about 10 times larger than GPT-3. However, the actual parameters used for any given prompt by…

Dec 3

Boosting LLM Inference Speed: High Performance, Zero Compromise

Chinmay Deshpande

Boosting LLM Inference Speed: High Performance, Zero Compromise

Explore key techniques to boost LLM inference speed: parallelization, vectorization, loop tiling, operator fusion, and quantization.

Mar 24

Turbocharge Large Language Model Training with Parallelization

Google Cloud - Community

Kartik Chaudhary

Turbocharge Large Language Model Training with Parallelization

Parallelization techniques for efficient distributed training of large deep learning models.

Nov 28

# Speeding up picoGPT — Implementing Speculative Decoding

Yash Jain

# Speeding up picoGPT — Implementing Speculative Decoding

A few days ago, I came across Karpathy’s minimal GPT implementation in NumPy (picoGPT). While exploring various optimization techniques…

Nov 17

LLM Optimization Techniques: Quantization, Distillation, Pruning, Parameter & Layer Reduction

Towards AI

Don Lim

LLM Optimization Techniques: Quantization, Distillation, Pruning, Parameter & Layer Reduction

What are the relationships between GPT-4o to GPT-4o mini and Gemini to Gemini Flash and Claude Opus to Claude Sonnet to Claude Haiku? The…

Nov 26

Breaking it Down: A Comprehensive Guide to Text Splitting for LLM Performance

Yash Bhaskar

Breaking it Down: A Comprehensive Guide to Text Splitting for LLM Performance

One of the most powerful strategies to elevate your language model performance is text splitting. It might sound basic, but trust me, it’s…

Nov 2

AI Mind

shashank Jain

Speculative Decoding in LLMs

Before diving into speculative decoding, it's important to understand the concept of decoding in NLP. Decoding refers to the process by…

Oct 24

Optimizing Transformers: Do We Really Need All That Attention?

Arash Nicoomanesh

Optimizing Transformers: Do We Really Need All That Attention?

“What Matters In Transformers? NOT ALL ATTENTION IS NEEDED” is an interesting paper that finds you can actually remove half of the…

Oct 23

See more recommended stories