Bhabani N – Medium

Bhabani N

he/him

Home

About

Pinned

Bhabani N

Accelerating Transformer Inference with Grouped Query Attention (GQA)

The inference speed of the base BERT model is 0.081608 seconds, whereas the GQA BERT model achieves an inference speed of 0.022804 seconds…

Aug 4

Accelerating Transformer Inference with Grouped Query Attention (GQA)

Aug 4

Bhabani N

Enhancing LLM Pre-training: URL-Level Data De-Duplication — Part I

In the pre-training phase of large language models, ensuring data quality is paramount, with de-duplication being a critical aspect of…

Aug 5

Enhancing LLM Pre-training: URL-Level Data De-Duplication — Part I

Aug 5

Bhabani N

Bhabani N

he/him

I am a passionate AI enthusiast

Help
Status
About
Careers
Press
Blog
Privacy
Terms
Text to speech
Teams