Adaptive AI: Creating Business Value in a Changing World

De Lange Matthias
Superlinear

--

The current pace of developments in Artificial Intelligence (AI) and foundation models have attained unparalleled heights, leading to unequivocal excitement from the public and increasing demand from the industry to include AI in their business processes. While change is the one constant in our world, ironically, the design of these very AI systems makes them set in stone. This can result in drastic consequences for your business relying on such static AI.

A timely example is openAI’s Large Language Model GPT-4. By learning from new data and trying to include new features asked by its users, GPT-4’s performance has dropped over time. Thus, a key factor for these models will be the ability to take up new capabilities, while preserving what it can do already. In other words, Adaptive AI, our topic of today.

Are you interested in what Adaptive AI is and how to leverage it for your business? We’ll take a look at ‘what’ data to use, and ‘when’ to adapt. And with this knowledge in our backpack, we can use Continual Learning to create your own personalized Large Language Model, tailored to your needs. Let’s get into it!

The urge for AI to keep on learning with data drift, new tasks, and extra data, in a changing world.

An evolving world trumps a static AI

Changes in the world can occur drastically, take for example the many companies suddenly impacted by COVID or expanded regulations such as GDPR. But changes also occur continuously and more subtly within society. This is evident in the increasing digitalization resulting for example in a yearly decline of postal letters to process, while the number of packages skyrockets with the rise of digital shopping.

All these changes in our ever-evolving world are reflected in the data that we use to train an AI system. This means that when setting up your AI and operationalizing it just once, it’s only a matter of time until it makes flailing predictions about an outdated world that doesn’t match the present day. This can result in significant opportunity costs for your business.

To emphasize the importance of dealing with change, let’s consider IBM’s Watson Health, which started with grand ambitions to revolutionize cancer care, but ultimately failed to provide useful or safe recommendations in many cases. The poor performance was compounded by rapidly evolving medical practices and guidelines that outpaced the training and data collection for the AI. The project caused substantial financial losses of over 4 billion dollars and reputational damage for IBM.

In short, identifying the changes in your data and quickly adapting to them with ‘Adaptive’ AI is a valuable asset to attain a competitive advantage. It can increase cost-efficiency in processes via accurate predictions and provide valid insights throughout a company’s entire value-chain.

Adapting AI to a world of change

In the following, I’ll show how to introduce ‘adaptive’ to your AI by answering ‘what’ data to consider for your use case, and determining exactly ‘when’ your AI should be updated.

‘What’ data to use.

There are two scenarios to consider when incorporating new data: Quick Adaptation and Continual Learning.

  • Quick Adaptation involves quickly adapting to a new situation by getting rid of outdated data and knowledge in our AI. This process not only requires updating the AI in a timely manner but also adaptation with just a few data samples. One way to achieve this is by relying on a publicly available pre-trained model or using efficient few-shot approaches designed for this purpose.
  • Continual Learning, on the other hand, focuses on incorporating the data from the new situation while maintaining the knowledge our AI learned from past data. Unlike humans, AI tends to forget previously learned information when learning new things. This is why completely retraining AI models is the common, albeit expensive, standard.
    Continual Learning methods provide an efficient alternative by allowing direct learning from new data with a fraction of the cost compared to complete retraining. These methods also prevent the AI from forgetting what it previously learned by using mechanisms such as freezing important model weights or rehearsing from a small buffer of old data points. To address the fluctuations over time as in GPT-4, Continual Learning methods will be the key ingredient to foster LLMs that can uphold and even improve their performance over time.

‘When’ to adapt

Now that we understand the two scenarios of Quick Adaptation and Continual Learning, let’s focus on ‘when’ to actually update our AI with the new data.

For Quick Adaptation, we use statistical tests to detect changes in our data that the AI is not prepared for. When triggered, these tests prompt a manual investigation to identify the source of the change, possibly leading to re-training the AI. Another approach is to periodically re-train the model to keep it up-to-date. Brussels Airport Company illustrates this by periodically recalculating their forecasts with the previous month’s data.

Quick Adaptation updates the AI for the latest trend.

For Continual Learning, ‘when’ to retrain depends on having sufficient new data to upgrade your AI. This data could be gathered through user feedback or acquired labeling budget, but could also be resulting from the desire to expand the AI’s scope. For example, if our product has a classifier to differentiate cars from bicycles but now needs to include hoverboards, Continual Learning can efficiently enhance the AI’s capabilities. We can add new features and meet changing demands without repeatedly starting from scratch.

Continual Learning allows flexible upgrades of your AI. It prevents forgetting while learning new stuff.

Building further upon our understanding of the ‘what’ and ‘when’ of Continual Learning, let’s now delve into the special use case of creating a personalized AI assistant for your company.

Use Case: A personalized AI assistant

With Continual Learning, complex company jargon can be integrated in your AI assistant.

Models like Chat-GPT and open-source variants, have shown tremendous leaps in the understanding of our everyday human language. Such AI assistants open avenues of opportunity for many manual processes in companies to become more efficient. For example, employees could directly interact via chat with an AI assistant to quickly retrieve information from a wide range of company documents. However, while these AI assistants are surprisingly good in what they have seen: common language, this is also their limitation. They haven’t necessarily seen the jargon used specifically in your company!

This is where Continual Learning comes into play: while we’d like to maintain the rich knowledge of these AI models about the world, we also want to tailor them to our specific needs. However, as we’ve seen, neural networks are prone to catastrophic forgetting when learning new stuff!

Will your AI assistant become completely useless as it forgets all its neat tricks such as summarizing texts, just because we teach it our company’s jargon?

Well, not necessarily…

Creating a jack of your trade

On the one hand, forgetting seems to surface as GPT-3.5 and GPT-4 have been drastically fluctuating in their performance over the course of 2 months as OpenAI makes updates to improve safety. Due to its undisclosed and proprietary nature, we can’t know for sure forgetting occurs due to tuning their initial models or because of other reasons, but at least we can agree that both the complexity of these models and the vastness of their data makes drift in their performance inevitable.

On the other hand, Large Language Models contain abundant knowledge that isn’t necessarily interesting for your use-case. From this point of view forgetting isn’t catastrophic, as we can tune the model from being a jack of all trades, to becoming an expert of your company’s jargon. Simply finetuning the model to your data might suffice without resorting to more complex Continual Learning strategies. Ideally you’d create a comprehensive testing dataset that not only contains your company’s jargon, but also data for potentially relevant tasks such as summarizing text. You can check out LLMops to find out more about such best practices.

Another major challenge remains, even with simple finetuning. Large models come with even larger infrastructure, so let’s see how we can handle this.

Lightweight personalization

Large Language Models typically range in the order of billions of parameters, and training them results in vast computation costs in the millions of dollars. The good news is that we can completely re-use this investment from the freely available open-source models. Starting from such a common language model, we can now adapt it to our company’s jargon by providing (as much as possible) text that contains the specific jargon. Nonetheless, even simply finetuning these models remains expensive, because of their massive size.

Luckily, recent research has provided a cost-efficient alternative to transfer new knowledge. Low-Rank Adaptation (LoRA) trains a tiny additional model instead of the entire language model, resulting in training up to 10,000 times less parameters for GPT-3! This tiny model can be unfolded and added to the original model to incorporate the jargon it has learned. Training can now happen on just one GPU in a matter of days rather than months! And best of all, we can use it to create our adaptive AI as it is fully compatible with the majority of Continual Learning methods.

Don’t forget Continual Learning

Now what if we find on our test data that the model forgets a crucial task, such as summarizing text, while finetuning with LORA? Can we prevent a decay similar to OpenAI’s updates on the GPT models?

It turns out that we can stop forgetting by rehearsing only a very small portion (<1%) of the tasks we want to preserve! This is a big win compared to the millions of data points that would be required to learn ‘summarization’ from scratch. ‘Rehearsing’ for the AI is much like humans do; revisiting things it learned before. So we need to rely on the AI’s engineers to share details of what data the model has seen before. That’s a no-go for proprietary models such as GPT, but luckily we can rely on open-source models such as Meta’s Llama that share this in great detail. Fantastic, this adds the final ingredient to our recipe for a personalized AI assistant!

Before concluding, it should be noted that catastrophic forgetting is not only limited to LLMs, as Generative AI fights the same battles. In Generative AI models, such as DALL-E and Stable Diffusion, the goal is to generate images from a description such as “cute dog”. When finetuning the model to generate images from your own dog, research at Google shows that the model forgets altogether what a dog is. Their Continual Learning approach uses the original model (that still knows what a dog is), to teach our new model about the dog, and hence avoiding the forgetting. Combining this approach with the memory-efficiency of LoRA, you can now even tune a generative AI model on your own device.

Conclusion

In this blogpost, I’ve guided you through the incredible potential of adaptive AI, poised to address the constant challenges presented by our rapidly transforming world. Adaptive AI serves two significant roles:

  1. It allows for rapid adjustment, keeping your business predictions aligned with real-time world scenarios, and
  2. It provides a cost-effective way to broaden the capabilities of existing models with new data or tasks. By capitalizing on the benefits of Continual Learning approaches, we can craft a personalized AI assistant at a fraction of traditional costs, while preventing the AI from forgetting.

As the saying goes,

“The only constant in life is change.”

So, why should our AI models be any different? With adaptive AI, let’s ensure they’re not just part of the change, but leading it!

Enjoyed the reading? Feel free to leave a comment, connect, or reach out for any questions, and let’s continue learning together.

--

--

De Lange Matthias
Superlinear

Passionate about communicating AI. Machine Learning Engineer @Radix. PhD in Adaptive AI @KULeuven.