Should you Prompt, RAG, Tune, or Train? A Guide to Choose the Right Generative AI Approach

Vikesh Pandey
8 min readAug 14, 2023

--

Generative AI is moving at a face pace with organizations experimenting with this technology to solve their business problems. There are lot of popular approaches out there, though when it comes to choosing the right approach to implement Generative AI solutions, there is a lack of clear guidance. The most common approaches discussed are:

  1. Prompt Engineering
  2. Retrieval Augmented Generation (RAG)
  3. Fine-tuning
  4. Training your own Foundation Model(FM) from scratch

This blog is an attempt to provide guidance on how to choose the right Generative AI approach for your use-case based on some common quantifiable metrics.

The option to “use the model as-is” is not included here, because there would hardly be any business use-case where a base FM can be used effectively. Using Base FMs as-is, are good to use for general search, but for anything serious, you will be looking at one of the options mentioned above.

How the comparison is performed?

The analysis is performed on the metrics mentioned below:

  1. Accuracy (How accurate the responses can be?)
  2. Implementation complexity (How complex the implementation can be?)
  3. Effort (How much effort is required to implement?)
  4. Total Cost of Ownership (TCO) (What is the total cost of owning the solution?)
  5. Ease of updates and changes (How loosely coupled the architecture is? How easy it is to replace/upgrade components?)

Assumption: We are going to rate each solution approach on these metrics and the analysis applies to the comparison only, and is not universally applicable. Eg: If Prompt engineering is rated low for a particular metric, it means it performs lower than other options and does not necessarily mean that it has low performance on that metric universally.

With that said, let us start with the comparison.

Accuracy

Let’s first get the most discussed point: which approach provides most accurate response?

  • Prompt Engineering is all about providing as much context as possible with providing few examples(few-shot learning) to make the FM learn about your use-case better. Though the results might look impressive in isolation, but it produces the least accurate results compared to other approaches discussed here.
  • RAG produces great quality result, due to augmenting use-case specific context coming directly from vectorized information stores. Compared to Prompt Engineering, it produces vastly improved results with massively low chances of hallucinations.
  • Fine-tuning provides quite highly accurate results with comparable quality output than RAG. Since we are updating model weights on domain specific data, the model produces more contextual responses. Compared to RAG, the quality might be marginally better, depending on the use-case. Hence, it is important to evaluate if its actually worth spending time doing a trade-off analysis between the two. Generally, there can be different reasons than just accuracy to choose fine-tuning. Reasons like, frequency of changes in data, controlling the model artifact in own environment for regulatory, compliance and reproducibility purposes etc.
  • Training from scratch produces the highest quality result amongst all. Since the model is training on use-case specific data from scratch, the chances of hallucination are close to none, and accuracy of output is also the highest among the comparison.
Credits: Vikesh Pandey

Implementation complexity

Let’s see how easy or difficulty it can be, to implement these approaches.

  • Prompt engineering has quite low implementation complexity, since it requires little to no programming. Good English (or other human interpreted) language skills with domain expertise is needed to draft a good prompt with in-context learning approaches and few-shot learning approaches.
  • RAG has higher complexity than prompt engineering, because you need to have coding and architecture skills to implement this solution. Depending on the tools chosen in a RAG architecture, the complexity can be higher.
  • Fine-tuning has even higher complexity than prompt engineering and RAG since the model’s weight/parameters are being changed via tuning scripts which requires data science and ML expertise.
  • Training from scratch has the highest implementation complexity since it requires vast amount of data curation and processing and training a fairly large FM which requires deep data science and ML expertise.
Credits: Vikesh Pandey

Effort

Let’s understand how much effort is need for each of the solutions.

Note that, implementation complexity and effort are not always directly proportional to each other.

  • Prompt engineering requires lot of iterative effort to get it right. The FMs are very sensitive to wording of the prompt and changing a single word or even verb sometimes gives complete different response. Hence it takes quite a few iterations to get it right for the respective FM.
  • RAG also requires moderate level of effort and bit higher than prompt engineering, due to tasks involved around creating embeddings and setting up vector stores.
  • Fine-tuning is a higher effort exercise than both prompt engineering and RAG. Though fine-tuning can be done with very little data( in some case even around or less than 30 examples) but setting up fine-tuning and getting the tunable parameters values right, takes time.
  • Training from scratch is the highest effort approach of them all. It requires huge amount of iterative development to get an optimal model with the right technical and business outcomes. The process starts with gathering and curating data, designing the model architecture and experimenting with different modelling approaches to get to the optimal model for a use-case. This process can be very long(weeks to months).
Credits: Vikesh Pandey

Total Cost of Ownership (TCO)

Next is the comparison around TCO.

Note that we are not talking only about the bill of services/components but the cost of completely owning the solution which includes any skilled engineer time spent on building and maintaining the solution, the cost of other tasks, like maintaining the infrastructure by yourself, down-times to perform patches and updates, setting up support channels, hiring, up-skilling and other miscellaneous costs.

  • Prompt Engineering TCO can be quite low since all you have to maintain is the prompt engineering templates and keep them up to date whenever there is a change in version of FM or a completely new FM altogether. Other than that, there would be the usual costs around hosting the FM or consuming it via a serverless API.
  • RAG TCO can be bit higher than prompt engineering due to multiple components involved in the architecture. It would depend on what embedding model, vector store and FM is used. Compared to Prompt engineering, it is higher simply because here you are paying for 3 different components instead of just one(FM).
  • Fine-tuning TCO will be higher than RAG and prompt engineering, because you are tuning a model which requires powerful compute and requires deep ML skills and understanding of the model architecture. Especially, the cost of maintaining such solution is higher due to tuning needed every time there is base model version update or new batches of data coming in, carrying the latest information regarding the use-case.
  • Training from scratch has the highest TCO as the team has to own the end-to-end data processing and ML training, tuning and deployment. It would require a group of highly skilled ML practitioners to do this. The cost of maintaining such solution is very high due to frequent re-training cycles needed to keep the model up to date with new information around the use-case.
Credits: Vikesh Pandey

Flexibility to changes

Let’s look at the options when it comes to ease of updates and changes.

  • Prompt Engineering has very high degree of flexibility since you just need to change the prompt templates based on the change in FM and use-case.
  • RAG has the highest degree of flexibility when it comes to changes in the architecture. You can change the embedding model, vector store and LLM independently with minimal to moderate impact on other components. It also has flexibility to add more components in the process like complex authorization without impacting other components.
  • Fine-tuning has quite low flexibility to changes since any changes in data and inputs would require another fine-tuning cycle which can be quite complex and time-consuming. Also, it is a lot of effort to adapt the same fine-tuned model to different use-case because the the same model weight/parameters may perform poorly on other domains than the one it was tuned on.
  • Training from scratch has the least flexibility to changes. Since in this case the model is built from scratch, performing updates on the model triggers another re-training cycle. Arguably, we can also fine-tune the model instead of re-training from scratch but accuracy will vary.
Credits: Vikesh Pandey

As evident from all the comparisons above, there is no clear winner. It depends on what are the most important metrics for your organization while designing Generative AI based solutions.

Summarizing the above, this is how the high-level guidance looks like:

  • Use Prompt Engineering when you want higher degree of flexibility in terms of changing the FM and prompt templates and your use-case does not contain lot of domain context.
  • Use RAG when you want highest degree of flexibility in terms of changing different components(data sources, embeddings, FM, vector engine), while keeping the quality of outputs high.
  • Use Fine-tuning when you want to have greater control over the model artifact and its version management. It can also be useful when the domain specific terminologies are very specific to the data (think legal, biology etc.)
  • Train from scratch when none of the above works for you and you have the ability to build an FM with trillions of well curated tokenized data points, sophisticated hardware infrastructure and a team of highly skilled ML experts. You should also have quite ambitious budgets and time for such initiative.

To summarize, choosing the right Generative AI approach requires deep thinking and evaluating your negotiable and non-negotiable metrics. There is no single right or wrong answer. It all “depends” :).

If you liked what you read, feel free give a clap and share it with in your network. As a technology enthusiast, I love writing on latest and greatest in AI/ML. Checkout my other articles on medium. You can also follow me on linkedin.

--

--

Vikesh Pandey

Sr. ML Specialist Solutions Architect@AWS. Opinions are my own.