Maxime LabonneinTowards Data ScienceCreate Mixtures of Experts with MergeKitCombine multiple models into a single MoEMar 278Mar 278
Benjamin MarieGoogle’s Gemma: Fine-tuning, Quantization, and Inference on Your ComputerMore training tokens and a huge vocabularyFeb 261Feb 261
Yeyu HuanginLevel Up CodingThe 2-bit Quantization is Insane! See How to Run Mixtral-8x7B on Free-tier Colab.A Quick Tutorial for AQLM-2-bit Quantization and its ImplementationFeb 202Feb 202
Yash BhaskarinCubedGroq Inference Engine — 18x Faster Than GPUsLet’s delve into Gro’s technology, its implications for various industries, and the transformative potential it holds for the future of AIFeb 225Feb 225
Benjamin MarieinTowards Data ScienceRun Llama 2 70B on Your GPU with ExLlamaV2Finding the optimal mixed-precision quantization for your hardwareSep 29, 20233Sep 29, 20233
Matthew HarrisinTowards Data ScienceSome Thoughts on Operationalizing LLM ApplicationsA few personal lessons learned from developing LLM applicationsJan 273Jan 273
Nikita KiselovinTowards Data ScienceBuilding, Evaluating and Tracking a Local Advanced RAG System | Mistral 7b + LlamaIndex + W&BLearn how to build, evaluate and track advanced RAG system using local Mistral-7b, LlamaIndex, and W&B. Step-by-step guide with code.Jan 192Jan 192
Dennis BakhuisinTowards Data ScienceThe Most Simple Way to Set Up ChatGPT LocallyThe Secret to Running LLMs on Consumer Hardware!Jan 174Jan 174
Vishal RajputinAIGuysMamba: Can it replace Transformers?Solving the quadratic scaling problem of Self-Attention.Jan 85Jan 85
Wenqi GlantzinTowards Data ScienceDemocratizing LLMs: 4-bit Quantization for Optimal LLM InferenceA deep dive into model quantization with GGUF and llama.cpp and model evaluation with LlamaIndexJan 152Jan 152
Gao Dalie (高達烈)inArtificial Intelligence in Plain EnglishCrewAi + Solar/Hermes + Langchain + Ollama = Super Ai AgentAs technology booms, AI Agents are becoming game changers, quickly becoming partners in problem-solving, creativity, and innovation, and…Jan 147Jan 147
Marcello PolitiinTowards Data ScienceDeploy Tiny-Llama on AWS EC2Learn how to deploy a real ML application using AWS and FastAPIJan 123Jan 123
Benjamin MarieinTowards Data ScienceRun Mixtral-8x7B on Consumer Hardware with Expert OffloadingFinding the right trade-off between memory usage and inference speedJan 113Jan 113
Iulia BrezeanuinTowards Data ScienceHow to Cut RAG Costs by 80% Using Prompt CompressionAccelerating Inference With Prompt CompressionJan 411Jan 411
Mandar Karhade, MD. PhD.inTowards AIRun Mixtral 8x7b on Google Colab FreeA clever trick allows offloading some layersDec 31, 20231Dec 31, 20231
Alon AgmoninTowards Data ScienceStreamlining Serverless ML Inference: Unleashing Candle Framework’s Power in RustBuilding a lean and robust model serving layer for vector embedding and search with Hugging Face’s new Candle FrameworkDec 21, 20231Dec 21, 20231
Martin ThissenMixtral 8x7B on Your Local Computer | Free GPT-4 AlternativeIn this article I will point out the key features of the Mixtral 8x7B model and show you how you can run the Mixtral 8x7B model on your…Dec 17, 20237Dec 17, 20237
Aaron 0928Hugging Face has written a new ML framework in Rust, now open-sourced!Recently, Hugging Face open sourced a heavyweight ML framework, Candle, which is a departure from the usual Python approach to machine…Aug 14, 202312Aug 14, 202312