Kelvin Lu
5 min readMay 28, 2023

FrugalGTP: A Low-Cost, High-Performance Building Block for Sophisticated LLM Applications

Photo by Andrew Wulf on Unsplash

Three weeks ago, researchers at Stanford University published a paper on the development of FrugalGPT, a low-cost, high-performance LLM solution. The paper, titled “FrugalGPT: How to Use Large Language Models While Reducing Cost and Improving Performance,” argues that FrugalGPT can achieve the best performance, even better than the top performance of single LLMs, with just a small fraction of the budget. The paper has been well-received by the research community, and it has already been cited by a number of other papers.

The development of FrugalGPT is an important step forward in the field of LLM research. It demonstrates that it is possible to achieve high performance while also reducing costs, a solution between two contradictory goals.

More importantly, the research on FrugalGPT inspired sophisticated LLM application designs. Most current research focuses on LLMs as fascinating gadgets, but LLMs have several drawbacks when used in enterprise applications. They are slow, costly, and they hallucinate. The researchers of FrugalGPT took a different approach, looking at LLMs from a higher perspective. They proved that with engineering skills, LLMs can be made more suitable for enterprise applications. Of course, the authors focused on cost reduction, which is just one aspect of enterprise design.

The Authors’ Approach

The researchers noticed that applying commercial large language models (LLMs) directly in enterprise applications can be very costly. For a mid-sized application, it can easily cost tens of thousands of dollars per month just in invoices paid to the LLM providers. The authors compared a number of available LLMs, considering their task-specific performance and pricing policies. They then came up with a combination of techniques to ensure the most performance within a controlled budget, including:

  • Prompt adaptation: FrugalGPT uses a technique called prompt adaptation to find the most efficient prompts for a given task. This is because the cost of LLM invocation has linear relationship with the size of the prompts.
Two types of prompt adaptations

The authors suggested two types of prompt adaptations to reduce the cost of using commercial LLMs in enterprise applications. The first type, called prompt selection, involves reducing the sample size for few-shots in-context training. However, the paper does not specify how the samples are selected. The second type, called query concatenation, involves combining multiple queries into a single LLM API invocation. This can save money because the shared context in the prompt is only sent once, even though it is used by multiple queries.

  • LLM approximation: FrugalGPT can also approximate complex LLMs with simpler, more cost-effective models. This is done by training a smaller model to mimic the behaviour of a larger model. An even simpler approach is just cache the query and response pairs for reuse.
LLM Approximation
  • LLM cascade: FrugalGPT can also use a technique called LLM cascade to reduce costs. In an LLM cascade, a query is passed through a chain of LLMs, stopping when an acceptable response is achieved. This can save money, as expensive LLMs are only used when necessary.

The key components of LLM cascade are a generation scoring function that generates a reliability score given a query and an answer produced by an LLM API, and an LLM router that selects a list of LLM APIs to include in the cascade.

The generation scoring function can be obtained by training a simple regression model that learns whether a generation is correct from the query and a generated answer.

The LLM router selects a list of LLM APIs by iteratively invoking the API in the list to obtain an answer. Then, it uses the scoring function to generate a score. It returns the generation if the score is higher than a threshold, and queries the next service otherwise.

Comparison Result

The authors collected 12 LLM APIs from 5 mainstream providers, namely, OpenAI [Ope], AI21 [AI2], CoHere [CoH], Textsynth [Tex], and ForeFrontAI [FFA]. The costs of the LLM APIs are as follows:

The authors also collected three databases for the testing:

Eventually, the authors setup a collection of tasks and ran the tasks against each of the LLM API vs FrugalGPT. The testing result is as the following:

As it turned out, the analysis of LLM performance on a designed test set yielded several insights:

Firstly, while some models have better overall performance than others, the ranking is task-specific. This suggests that it is important to analyze LLM performance on a task-specific test set before deploying an LLM in production.

Secondly, as promised, FrugalGPT was able to achieve the same level of performance as the top-performing single API with negligible cost. This makes FrugalGPT a promising option for businesses that are looking to reduce the cost of using LLMs.

Finally, it is impressive that FrugalGPT outperformed the best single model by margins on both the COQA and HEADLINES tasks. This suggests that FrugalGPT has the potential to be a valuable tool for a wide range of applications.

Conclusion

Large language models (LLMs) are evolving rapidly. New models are announced every few weeks, and the performance continues to improve. New open source LLM development frameworks are also emerging. I am excited to see how the LLM ecosystem continues to evolve in the years to come. I believe that LLMs have the potential to make a significant impact on the world, and I am eager to see how they are used to solve some of the most pressing challenges of our time.

However, there are still some inherent issues with LLMs that have not been resolved. These issues may persist for some time, until new groundbreaking technologies replace LLMs. One of the most harmful issues is hallucination. Before we can confidently deploy LLMs in enterprise production environments, we need to make sure that they will only generate text that is accurate and truthful.

The paper fore-mentioned focuses on cost control, which is another important concern for LLM applications. However, it also sheds light on the importance of solving LLM problems with thoughtful engineering strategies, rather than simply focusing on improving the models themselves.