Shifting Sands: OpenAI’s Fluctuating Model Performance and the Impact on Developers

3 min readJul 24, 2023

Introduction

Building LLM applications can be challenging, however when you add in fluctuating model performance it can be as impossible as crossing a desert during a sandstorm. Recently, OpenAI, the provider of the two most widely used Large Language Models (LLMs), has faced criticisms for fluctuations in model performance, causing major concern among developers. A research paper from Stanford and UC Berkeley shed light on these fluctuations, confirming what many in the tech community had suspected. On top of this, OpenAI also suddenly discontinued support for certain models without prior notice, further adding to developers’ challenges. So, is now the time for developers to consider using self-hosted LLMs?

Model Performance Fluctuations

OpenAI’s model performance has shown quite considerable fluctuations, raising questions among developers. The Stanford/Berkeley research paper highlighted a notable accuracy drop when the team gave the model a mathematical task — identifying prime numbers — which fell from 97.6% in March 2023 to 2.4% in June 2023. While the cause of these changes remains uncertain, it is essential for OpenAI to address these issues transparently — a route they have not historically chosen. Developers rely on stable and consistent AI models for their applications, and clear communication is vital to maintain trust in the platform.

Impact on Developers

The impact of model performance fluctuations extends beyond inconvenience; it affects developers’ ability to create robust applications. For example, developers have had to completely rework prompts for each new model. The same prompts can also have very different impacts on even slightly different versions of the models. Therefore, this lack of confidence in model reliability creates a challenging development environment and may impact end-user experiences.

At the same time, OpenAI chose to discontinue the support for certain models, which resulted in forced migration to newer versions and thus presented additional challenges for developers. Developers have had to invest significant time and effort into rewriting prompts and addressing compatibility issues. This unexpected shift disrupted development timelines and added pressure to engineering resources.

Embracing the Power of Self-Hosted LLMs

In response to these challenges, self-hosted Large Language Models (LLMs) are gaining popularity. This is especially the case in industries dealing with sensitive data. Fine-tuning models on specific data allows businesses to improve accuracy and maintain control over their AI applications. The emergence of improved open-source tooling, as well as companies like ours at TitanML, has made self-hosted LLMs more accessible, empowering developers to take charge of their own solutions. Additionally, the recent release of Meta’s LLaMA 2.0, promising performance similar to API-based models like GPT-3.5, especially when fine-tuned, adds to the appeal of self-hosted solutions.

While OpenAI’s API may offer a convenient starting point, it is essential to consider the long-term costs and risks. Relying on a stable and reliable model is crucial for sustainable application performance. Self-hosted LLMs may require an initial investment, but they offer the much-needed control and stability required for long-term success.

Conclusion

The fluctuations in OpenAI’s model performance and the lack of prior notice for discontinuations have sparked important discussions within the developer community. OpenAI must address these concerns transparently to maintain developers’ trust and confidence in their services. As businesses and developers adapt to the shifting sands of the AI landscape, it is vital to consider long-term needs and resources when choosing between OpenAI’s APIs and self-hosted LLMs. By prioritising stability and control, developers can build reliable applications that thrive in this ever-evolving artificial intelligence terrain.

If you want to learn more about how you can reap the rewards of fine-tuned open-source LLMs, please visit: titanml.co

About TitanML

TitanML enables machine learning teams to effortlessly and efficiently deploy large language models (LLMs). Their flagship product Takeoff Inference Server is already supercharging the deployments of a number of ML teams.

Founded by Dr. James Dborin, Dr. Fergus Finn and Meryem Arik, and backed by key industry partners including AWS and Intel, TitanML is a team of dedicated deep learning engineers on a mission to supercharge the adoption of enterprise AI.

Join the discord here

Join the platform beta here