Fine Tuning or Prompt Engineering for Enterprise AI?

Jagane Sundar
InfinStor
Published in
6 min readAug 1, 2023

Introduction: ChatGPT knows nothing about your Enterprise’s internal data

Foundational Large Language Models such as ChatGPT by OpenAI and Meta’s Llama 2 do not have any information specific to your own Enterprise. In order for LLM based Enterprise Chat applications to be useful, they have to be knowledgeable about your Enterprise. There are two ways to teach LLMs the specifics of your Enterprise:

  1. Fine Tune LLM in advance using your data
  2. Use prompt engineering to provide context to the LLM at the time of chat

Fine tuning a large language model using Enterprise Data results in a high quality chat experience for the user. Additionally, there are no limits on the amount of Enterprise Data that can be used to fine tune the LLM. However, fine tuning an LLM turns out to be a time consuming and costly activity.

Prompt engineering is a new way of extracting useful chat responses from large general purpose foundational LLMs. Recent LLMs, such as the Meta Llama2 series, are much more capable and can produce insightful chat responses that are informed by your own Enterprise’s data. There are two important requirements for this — very large model sizes and good prompt engineering. Most advantageously, prompt engineering does not incur the compute cost or human effort of fine tuning a LLM. However, the prevailing limitation of prompt engineering is that the context size is usually limited to around 4K tokens.

Data Types in Enterprise AI Applications

That brings us to the essential question posed by this article — should you fine tune a LLM using your own Enterprise’s data or should you use intelligent prompt engineering using results from a vector database based similarity search? The answer to this question depends on the type of data that your AI application uses.

For the purpose of this article, I am going to classify Enterprise AI Applications into two buckets:

  1. Static Data AI: AI applications that use slow moving data
  2. Constantly Updated Data AI: AI applications that use data that is constantly updated, the incremental data must be processed

Examples of slow moving data AI applications:

  • Knowledgebase chat on product documentation, e.g. router, switch, security gateway documentation in a networking device company
  • Knowledgebase chat on human disease information
  • Insurance risk estimator based on zip code

Examples of constantly updated data AI applications:

  • Chat based troubleshooting assistant for networking issues that the customer is currently experiencing — the data is device logs and events
  • Disease diagnosis assistant for a physician who is in front of the patient
  • AI Chat based helper application for a Customer Support agent at a SaaS software company, for use while helping customers on the phone

Fine tuning for static data, Prompt Engineering for constantly updated data

Here is the answer to the question posed by this article.

Fine tuned models are suitable for use with static or slow moving data. The data changes infrequently, maybe once every month. Fine tuning of a chosen LLM can be accomplished in a matter of days and the model can be put in production after suitable validation. It is useful to employ a tool such as MLflow to manage model versions.

When processing constantly updated data, examples of which include CloudWatch logs and Salesforce database transactions, stock LLMs with relevant prompt engineering works remarkably well.

The logic is very simple — fine tuning a model may take hours, if not days. Creating a chat application that knows about data generated in your Enterprise in the last fifteen minutes is impossible if you plan to fine tune a LLM for this purpose.

As for the cost of fine tuning a large LLM, here’s a quick back-of-the-napkin calculation. Let’s say you have 1GB of documents that you want to use in your fine tuning effort. Note that 1GB is roughly 167 books of 1000 pages each, or about 167 million words. Words are converted to tokens at the following rate: 4 tokens are needed to represent 3 words. Hence 167 million words requires 222 million tokens. According to OpenAI’s pricing page, as of July 2023, finetuning their best model, the Davinci model, costs $.03 for every 1K tokens. Hence 222000 tokens cost $6660. In summary, fine tuning the OpenAI Davinci model using 1GB of data costs US$6660 in July 2023.

Let’s consider the high level workflow for each of these two methods.

In the case of static or slow moving data, OpenAI’s Davinci model is fine tuned using the data. The fine tuned model is available for inference in OpenAI in SaaS mode. The web based Chat application utilizes OpenAI hosted, fine tuned model.

Hence, for slow moving data, fine tuning a model, perhaps once a month and then building a Chat Application using that model makes good sense.

In the case of constantly updated data, the cost and time to fine tune OpenAI’s Davinci model is prohibitive. Hence, the AI application architecture consists of periodic processing to read the new data, generate embeddings for the data and to store the embeddings in a vector database. The chat interface then uses Similarity Search against the vector database to find similar data and creates prompts for the LLM using the results of the similarity search. The prompt, along with the chat statement results in a high quality Chat experience for the user.

Constantly Updated Data AI Chat Example

Here’s an example of the above architecture. At InfinStor, we offer an Enterprise MLflow SaaS product on AWS Marketplace. When customers encounter a problem, we used to have to crawl through large volumes of CloudWatch logs in order to identify the specific customer’s problem. We developed InfinLogs, our CloudWatch log analysis software to address this specific use case. It is designed in the manner described above. In the following example, we have a user whose ‘Create Experiment’ action is failing. Our support staff first chooses a reasonable 24 hour period of interest and enters the following chat query:

What is the problem?

Our system then searches for similarities between the chat text from the user and our Vector Database contents and adds the following to ChatGPT’s prompt:

My log has the following lines ‘Cognito Username: Joe.User. program
exception: Required tag projectid not present’ and ‘Create Experiment
failed’. Who is the user and what is the problem?

ChatGPT responds with the following:

Based on the log lines you provided, the user mentioned in the log is
“Joe.User.” The log entry states: “Cognito Username: Joe.User. program
exception: Required tag projectid not present.” This indicates that an
exception occurred in the program for the user “Joe.User,” and the issue
seems to be related to the absence of the “projectid” tag, which is required
for the operation.

The second log line, "Create Experiment failed," does not provide any
specific user information, but it could be related to the previous log entry
mentioning the failure due to the missing "projectid" tag.

The problem seems to be that the "projectid" tag is missing, and this is
causing the "Create Experiment" operation to fail for the user "Joe.User."
To resolve the issue, you should ensure that the "projectid" tag is provided
when attempting to create an experiment for this user.

The response from ChatGPT is remarkably insightful. We accomplished ths without any fine tuning. To polish off the response, we add direct links to the relevant CloudWatch log lines. As you can imagine, this is an invaluable tool for our support staff.

Originally published at http://docs.google.com.

--

--

Jagane Sundar
InfinStor

Entrepreneur, Technology Enthusiast, Machine Learning student, Cloud Computing expert, Big Data expert, Distributed Coordination expert