The Cheapest Way to Deploy Finetuned/Base LLMs

Qendel AI
5 min readOct 12, 2023

What use is the finest model if it isn’t serving efficiently?

Zero!

If not serving efficiently, models are useless. Some counter-intuitive truths I learned over the years:

  • A small model might outshine a larger model.
  • A 50% accurate model could be better than a 100% accurate model.
  • A simple model might just be more beneficial than a more complex model.
  • And the list goes on…

They could be better if they serve and meet the needs of the use case at hand.

Although it can be adapted for almost any LLM, specifically, this article will demonstrate how to efficiently deploy a finetuned Mistral 7B Instruct Model and explain how to:

  • Merge model adapters with its base model.
  • Push the model and tokenizer to HuggingFace (HF)
  • Add deployment files to your HF repository for serving.
  • Deploy the model to the HF Inference Endpoint.
  • Query the deployed model.

Ready to start? Let’s dive deep into the deployment process.

Deploying Finetuned Mistral 7B Instruct Model

--

--