Semin CheoninSqueezeBits Team BlogHow much can we save through compression?Estimating the cost savings from model compression.Jun 26Jun 26
Semin CheoninSqueezeBits Team Blog‘Breaking Down’ tokenizers in LLMsAn introduction to tokenizers and their implications in language models.May 10May 10
Semin CheoninSqueezeBits Team BlogSLEB: Streamlining LLMs through Redundancy Verification and Elimination of Transformer BlocksAccelerating LLM inference by pruning redundant transformer blocksMay 7May 7
Semin CheoninSqueezeBits Team BlogAccuracy Degradation in AI Compression: Myth or Truth?Clarifying the misunderstandings in AI model compressionApr 23Apr 23
Semin CheoninSqueezeBits Team BlogAre you getting everything out of your GPUs?At the 2024 GTC event, Nvidia CEO Jensen Huang got on the stage to deliver his keynote speech, in which he divulged the newest GPU…Apr 2Apr 2
Semin CheoninSqueezeBits Team Blog4 Types of AI Compression Methods You Should KnowAI model compression for acceleration is essential. The question is HOW? Here are 4 key methodologies.Mar 21Mar 21
Semin CheoninSqueezeBits Team BlogThings to check if your business utilizes AIDo I need to COMPRESS my AI model? : the short answer is “Yes” — and here’s why.Apr 191Apr 191