LLMs in production — the missing discussion

Sowmya V.B.
3 min readApr 17, 2023
source: Pixabay.com

Last week, I attended a virtual conference organized by the MLOps community on using Large Language Models (LLMs) in production, and read a comprehensive writeup by Chip Huyen on the same topic. This blog post contains some of my thoughts on the topics I missed seeing in these two.

Chip Huyen highlighted three main challenges of productionizing LLM based applications, listed below, and also discussed ways to address them:

  • Prompt evaluation/versioning/optimization, their relative stability etc
  • cost and latency issues
  • using prompting vs fine-tuning vs other methods

In the conference, the talks focused on three primary aspects:

  • typical production pipelines and the issues you face (e.g., using prompting for a version 1, getting into finetuning for v2, and as the product matures, and as you have some data collection strategies in place, you explore other options)
  • tools to support LLMs in production (e.g., langchain)
  • importance of data quality (talks from SnorkelAI, and Galileo)

Yet, for LLMs or any other NLP approach in production, I think two other aspects: clear evaluation strategy beyond cost and latency tradeoffs, and a clear data privacy policy are as important as…

--

--