AI Development Tradeoffs using Prompting, Fine-Tuning, and Search Engine Embeddings

Carlos E. Perez
Intuition Machine
Published in
6 min readAug 15, 2023


Midjourney Generted — Prompt is same as title.

Why is prompting superior to fine-tuning in AI development?

Prompting enables much faster iteration cycles and lower costs compared to fine-tuning, for a few key reasons:

— No training required: Prompts can be directly tested on production models, with no training time. Fine-tuning requires repeatedly training models, which can take hours or days even with lots of compute resources.

— Rapid prototyping: Many prompt variations can be tried out quickly to see what works best. Fine-tuning requires waiting for each model to finish training before evaluating.

— Low data requirements: Prompts need just a few examples, while fine-tuning needs thousands of training data points. Acquiring training data itself can be costly and slow.

— Easy sharing: Prompts can be shared instantly with others. Sharing fine-tuned models requires exporting model files and configurations.

— Modification flexibility: Prompts can be dynamically changed at request time. Fine-tuned models are static after training.

— No hosting overhead: Prompts use general API access so no hosting or engineering overhead. Fine-tuned models need specialized hosting infrastructure.

— Lower compute costs: No training compute required for prompting. Fine-tuning uses extensive compute resources during training.

— Reduced licensing costs: Prompts use the standard licensed model pricing. Fine-tuning may require additional licensing fees.

In summary, prompting eliminates virtually all the overhead, delays, and costs involved in data collection, model training, infrastructure, and licensing. It provides an agile method for prototyping and iteration. These advantages make prompting superior for initial exploration.

When is fine-tuning necessary in AI development?

Here are some situations where fine-tuning large language models becomes necessary:

— When high accuracy on complex niche tasks is required

— Fine-tuning allows models to specialize deeply for a task through extensive training data. Prompting has limited ability to impart complex behaviors. — When the desired model output requires very rigid and reliable structure

— Prompting can struggle to enforce structural constraints as strictly as fine-tuning can through training.

— When there is a large corpus of high-quality in-domain training data available — With sufficient data, fine-tuning can enrich models with nuanced knowledge better than prompts.

— When the training data covers many variations, edge cases, and noise types

— Fine-tuning handles diversity well, while prompts cover limited cases. — When low latency and cost per query during serving is critical

— Fine-tuned models are optimized for fast low-cost serving without prompts.

— When problems require modeling complex relationships and reasoning chains

— Fine-tuning helps stabilize reasoning over long text generations better.

— When custom models are the only option due to licensing restrictions

— Many licensed models only permit fine-tuning for customization.

— When specializing from a very large model to reduce hosting costs

— Fine-tuning enables distilling smaller custom models from large foundations.

— When issues of hallucination, fact consistency, or stability are present

— Fine-tuning can help avoid these failure modes in some cases.

In summary, for production use cases with rigorous accuracy, latency, stability and cost requirements applied to complex structured tasks, fine-tuning becomes a necessity over prompting. The benefits outweigh the costs when these factors are critical.

What can fine-tuning achieve that can’t be done via prompting?

Here are some key benefits of fine-tuning that are hard to replicate with just prompting:

— Learning complex structure: Fine-tuning allows models to learn nuanced structural patterns, beyond what can be conveyed through examples in a prompt. This was demonstrated on tasks like ViGGO and SQL generation, where fine-tuned models significantly outperformed prompted models. — Incorporating more data: Prompts are limited by context length, while fine-tuning leverages the full dataset to update model weights. This allows incorporating orders of magnitude more examples.

— Background knowledge enrichment: By fine-tuning on in-domain datasets, models can rapidly acquire relevant background knowledge that would be hard to convey through a prompt.

— Cost and latency benefits: Fine-tuned models no longer require lengthy prompts as input, reducing token costs. Latency also improves by avoiding wasted tokens.

— Model compression: Fine-tuning allows creating smaller specialized models distilled from a large model, reducing hosting costs. Prompting always requires the large model.

— Parsing constraints: Fine-tuning can teach models to generate output in a very rigid constrained structure that may be difficult to consistently enforce through prompting.

— Stability: For some tasks, fine-tuned models can be more stable than prompted models by reducing hallucinations and unfamiliar concept references.

— Licensing: Some licensed models only permit fine-tuning but not prompting, so fine-tuning unlocks additional large pretrained models. In summary, fine-tuning allows models to learn complex behaviors, ingest more data, compress knowledge, and generate highly structured outputs in a stable way. These benefits are hard to attain through prompt engineering alone.

What are the advantages of search engines with vector embeddings versus prompting and fine-tuning?

Here are some potential benefits of using vector search engines over prompting and fine-tuning:

— Simplicity: Search engines just need indexed data. No need to train or carefully craft prompts.

— Flexibility: Search can handle broader queries and data types like images/video beyond just text. Prompting and fine-tuning are focused on text. — Interpretability: Search results are accompanied with relevance scores indicating how well they match the query. ML model outputs don’t have similar innate scores.

— Speed: Indexed search is extremely fast, providing sub-second results. Prompting and fine-tuning have slower inference times.

— Freshness: Indexes can be updated rapidly as new data comes in. Models need periodic retraining.

— Cost: Search engines have relatively low hosting costs and predictable pricing based on usage. ML model hosting and usage costs can be harder to optimize.

— Quality: Modern search relevance is highly optimized and robust after decades of work. NLP model quality can be less predictable.

— Customization: Search allows flexibly adding different data sources. Models are more rigidly tied to training data.

— Stability: Search engines have well-established update procedures to avoid instabilities. ML models can be fragile with data shifts.

The downside is that search engines have less ability to generalize or be creative compared to large language models. But their simplicity, robustness and cost profile make them a compelling choice for many applications, especially those needing flexible access to growing data.

Advanced prompting, fine-tuning LLMs, and search engines each have complementary strengths that can be combined to build more powerful and robust AI products:

— Use search engines for fact lookup, fresh content, and interpretable results. Their speed and flexibility offset limitations of LLMs.

— Employ prompting for quickly exploring how LLMs can address product needs. Prompts provide rapid iteration without training.

— Fine-tune LLMs when large labeled datasets are available to specialize models for the product’s niche. This boosts quality.

— Chain search results into prompts to provide LLMs useful context and facts for generation and reasoning.

— Apply prompts to guide LLMs to extract key phrases and named entities from content to enrich search indexes.

— Use search metrics like clicks, dwell times, and highlights as training signals for fine-tuning to user needs.

— Leverage search autocomplete to guide prompt engineering towards high value product use cases.

— Build hybrid systems that return both search results and relevant LLM generations to users.

— Use search relevance signals to select best LLM outputs when multiple are generated.

By combining strengths in this way, products can gain accuracy, speed, scalability, and interpretability. The key is utilizing each AI technique where it shines while mitigating limitations with the others. This allows delivering the benefits of search, prompting, and fine-tuning in a complementary system greater than the sum of its parts.

Read more about prompting here: