Published inTDS ArchiveBoosting LLM Inference Speed Using Speculative DecodingA practical guide on using cutting-edge optimization techniques to speed up inferenceAug 27, 2024Aug 27, 2024
Published inTDS ArchiveImproving RAG Performance Using RerankersA tutorial on using rerankers to improve your RAG pipelineJun 25, 20241Jun 25, 20241
What I Learned As A Forward Deployed Engineer Working At An AI StartupIn January 2024, I started working full-time as a forward deployed engineer at a company called Baseten. Baseten enables customers to…Jun 2, 2024Jun 2, 2024
Published inTDS ArchiveDeploying LLMs Into Production Using TensorRT LLMA guide on accelerating inference performanceFeb 22, 20245Feb 22, 20245
Published inLevel Up CodingDeploying Codellama As A REST API ServiceIntroductionNov 1, 20231Nov 1, 20231
Published inLevel Up CodingCreating AI Generated QR Codes Using Stable Diffusion And ControlNetGenerate awesome looking QR codes using AI and PythonOct 6, 2023Oct 6, 2023
Published inTDS ArchiveIncrease Llama 2's Latency and Throughput Performance by Up to 4XReal-world benchmarks for Llama-2 13BAug 9, 20234Aug 9, 20234
Published inThe GeneratorThe Witcher’s Scripting Sorcery: Empowering The TV Adaptation With Large Language ModelsRecreating A TV Show Script For The Witcher Based On The BooksJul 31, 2023Jul 31, 2023
Published inTDS ArchiveDeploying Falcon-7B Into ProductionRunning Falcon-7B in the cloud as a microserviceJul 7, 20236Jul 7, 20236
Published inLevel Up CodingSupercharging ChatGPT: Elevate Conversations with Custom Functions via Function CallingA quick tutorial on using function calling with ChatGPTJul 4, 20231Jul 4, 20231