PinnedHuiqiang JianginLlamaIndex BlogLongLLMLingua: Bye-bye to Middle Loss and Save on Your RAG Costs via Prompt CompressionIn the RAG, after the retrieval phase, it’s necessary to perform Re-ranking + Fine-Grained Prompt Compression + Subsequence Recovery to…Nov 6, 20231Nov 6, 20231
Huiqiang JiangHow to Optimize TTFT of 8B LLMs with 1M Tokens to 20sIf you aim to optimize an 8B model with a 1 million tokens TTFT (time-to-first-token) to 20 seconds, you might consider the following…Jul 21Jul 21