PinnedBhabani NAccelerating Transformer Inference with Grouped Query Attention (GQA)The inference speed of the base BERT model is 0.081608 seconds, whereas the GQA BERT model achieves an inference speed of 0.022804 seconds…Aug 4Aug 4
Bhabani NEnhancing LLM Pre-training: URL-Level Data De-Duplication — Part IIn the pre-training phase of large language models, ensuring data quality is paramount, with de-duplication being a critical aspect of…Aug 5Aug 5