Hosting your own LLM’s? 10 things to consider.

Biju Krishnan @ DataSiens
3 min readJul 25, 2023

In this blog series I plan to cover all that is needed to host your own LLMs.

LLM’s in an enterprise

# 10 Things to Consider When Hosting Your Own LLM

Large language models (LLMs) like GPT-3 provide powerful AI capabilities, but deploying your own custom LLM service requires thorough planning. Here are 10 key aspects to think through:

1. Fine Tuning

A large language model (LLM) will need to be fine tuned with your own dataset to adapt it to your own use case. Fine tuning will most likely need specialised compute infra based on GPU, TPU’s etc.

2. Data Engineering

Data needs to be in a format that the fine tuning tools require, for example causal language modelling requires two columns, prompt and answer or question and answer. The raw data is likely to be in the JSONL format and some of the tools mentioned above can ingest it directly in this format, but some need it to be converted to CSV.

If the conversation is in the form of a thread, then each line of conversation will also need a parent conversation id linked to it.

Here are some nice reads to help you understand the nitty gritties of data engineering for LLM’s.

  • Folks are truefoundry have nicely outlined how to preprocessed their confluence data for fine tuning their chat bot.
  • Here are some details from the clever people at Flowrite, specifically on dataset engineering for LLMs.
  • H2O Wizard LM is an open source implementation of Wizard LM which helps convert documents into Q&A pairs for LLM fine tuning.

3. Experiment Tracking

Use ML tools like Weights & Biases, Neptune, or WhyLabs to meticulously track fine tuning experiments, evaluate variations, and run fine tuning experiments. Continual experimentation leads to iterative improvements.

4. Model Deployment

Once trained, serve your model for predictions using Docker, Kubernetes, AWS SageMaker, Hugging Face Inference or other platforms tailored for scale and low latency.

5. Orchestration

Connect your LLM to internal systems like databases using orchestrators like LangChain, Microsoft’s Semantic Kernel, or Langflow. Proper orchestration weaves your model into business workflows.

6. Monitoring & Guardrails

Carefully monitor all model outputs and have rigorous guardrails in place to catch inappropriate or dangerous responses. Use tools like NVIDIA NeMo, Microsoft’s Responsible AI Guidance, and Hugging Face’s Inference API.
Watch this talk during Microsoft Build to understand how Microsoft is supporting its customer in serving generative AI responsibly.

7. Vector Databases

Store key data in vector databases like Weaviate, Pinecone, or Redis to quickly find relevant information to combine with model predictions.

You might not need fine tuning of models frequently if you could create embeddings of your knowledgebase and use the search results as input to the LLM model.

8. Security

Conduct extensive security reviews covering data protection, access controls, infrastructure vulnerabilities, and attack surface reduction. Follow guidance from OWASP.

9. Compliance

Build thorough compliance checks and controls covering fairness, transparency, and responsible AI principles. Review FTC’s questionnaire to OpenAI as a starting point.

10. FinOps

Closely manage costs in the cloud with FinOps tracking, monitoring, and optimization. LLM infrastructure can get expensive without careful cost governance.

Right now integrating all the tools and implementing guardrails is quite a task, however hyperscalers are accelerating their efforts to provide an integrated suite of services for you to launch an LLM service with your own data. It might be worth it to create an experimentation stack the hard way, so as to understand the nitty gritty of hosting LLMs within your own VPC or datacenter.

Follow me for a detailed write up on each of these aspects to be released every week.

--

--

Biju Krishnan @ DataSiens

I have over 20 years of experience in helping enterprises manage data, and more than half of this in building scalable platforms for analytics, AI and ML.