Future Trajectories: Five Predictions for Large Foundation AI Models

Alexander Kremer
Picus Capital
Published in
9 min readJun 18, 2023
Source: iStock

The introduction of OpenAI’s ChatGPT in late November 2022 catalyzed a wave of heightened interest and accelerated productivity in the field of Large Foundation Models (LFMs), including Large Language Models (LLMs). Broadly speaking, LFMs are the backbone of most Generative AI systems.

The spectrum of LFMs’ users is diverse, providing multiple strata of society with access to AI. At one end of the spectrum, consumers (2C users) can directly access chatbots like OpenAI’s ChatGPT, Google’s Bard, and Inflection AI’s Pi which are products built on LFMs. On the other hand, companies from all types of industries are improving their internal operations, accessing LFMs via APIs of large companies like OpenAI. Additionally, numerous Consumer and Enterprise Software companies are embedding new AI features in their products & services, leveraging LFMs via APIs.

In light of the rapid pace of innovation in the field of LFMs, it’s challenging to track the changes in the landscape and comprehend their implications for business and individual users. This article thus presents five predictions on the future trajectory of LFMs and the resulting impact on the development of AI applications along the typical lifecycle of AI system deployments: from (pre-trained) model selection over fine-tuning (or more extensive training) to operations (including serving). The hypotheses discussed here are derived from market observations, expert insights, and extensive discussions with pioneers in the field.

Prediction 1: Transformer Models will Continue to Amaze

Transformer models form the technological backbone of current state-of-the-art LLMs and several LFMs. Despite their relatively recent inception — just over six years ago — they have demonstrated immense potential across language and computer vision (CV) tasks. The Transformers model architecture notably was popularized by the seminal “Attention Is All You Need” paper.

Post-publication, the scientific community devoted years to unravel the immense potential of Transformer models, as reflected by its soaring citation count: from 1K+ in 2018 to 40K+ just in 2022.

Figure: Citations of the Attention Is All You Need paper have reached 40K in 2022 alone (Source: Picus Capital analysis based on Google Scholar data)

The scientific community has been driving forward the work on Transformer-based LFMs for years, but currently a lot of progress is being made in engineering and data science practices. A proxy to assess the popularity of Transformer LFMs are Google searches which show that it was only from early 2022 (though still before the launch of ChatGPT) that interest in the concept “transformer machine learning” picked up significantly. The interest reached a new peak just this year.

Figure: Google Search Trend data shows the increasing interest in Transformer models (Source: Picus Capital analysis based on Google Trend data)

These two figures suggest that despite the Transformer model’s inception in 2017, interest from the scientific community, practitioners and the general public only surged in recent years. Consequently, it is plausible to assume that the room for advancement is vast, with constant attempts to augment performance and discover new application areas. Hence, we anticipate that Transformer models will continue to astound us.

A recent example showing us how much is still possible with Transformer models is Meta’s I-JEPA which is a visual Transformer model with just 600M+ parameters (instead of tens of billions) which uses a different approach to learning. As per design, I-JEPA does compare abstract representations of images, rather than comparing pixels themselves to solve CV tasks. As such, it is computationally extremely efficient (and thus rather cheap to operate) and delivers outstanding results with an actual understanding of visual concepts.

Prediction 2: Open-source Models to Coexist alongside Closed-sourced LFMs with Growing Share

The LFMs market is presently characterized by the dominance of a few major players like Google, OpenAI, Microsoft, emerging entities like Anthropic, Cohere (LLMs) as well as Midjourney, and Stable Diffusion (CV). OpenAI, for instance, is reported to aim for 200M USD in revenues in 2023. These entities have developed proprietary, closed-source systems with limited transparency about model architectures, weights, and underlying data.

Simultaneously, a growing community of developers open-sourcing their LFMs is building customizable and flexible systems, albeit challenging to further train and operate. The release of Meta’s LLaMA in February 2023 was a game-changer in this regard.

Figure: Hugging Face in recent years became the Hub for open-source LFMs (Source: Hugging Face Open LLM Leaderboard)

Each model group — closed- and open-source — has its own advantages and disadvantages. Closed-source models (and their APIs) allow for immediate usage with limited set-up and configuration work required. The task of AI alignment is mostly solved for these models. They also tend to be easily scalable and come with user support. However, they are often a black box to users and do not allow for much customization. Meanwhile, assessing them via APIs is not cheap. Enterprise users might also experience vendor lock-in and could be concerned about sensitive and valuable data not staying within the organization. Many of these issues can be addressed by open-source LFMs, though they require a whole different skill set in terms of operations by organizations aiming to use them. Now, the open-source community has been extremely busy — especially since the launch of LLaMA — to continuously develop and improve the open-source alternatives. Meanwhile, distribution via open-source communities is a natural way of expanded distribution.

Naturally, as open-source LFMs get better, the infrastructure for operating LFMs has to keep up, further increasing the need for established players and start-ups to build services referred to as MLOps, as, for instance, our portfolio company TensorChord does.

In the future, we foresee a coexistence of closed- and open-source models, with a tilt towards the latter, as the MLOps (and even LLMOps, more specifically) toolkit becomes more advanced. Closed-source models will be used by businesses and organizations that need a high degree of performance and reliability. Open-source LFMs, on the other hand, will be used by enterprises that have higher internal operational capabilities to train/fine-tune and operate (incl. serving) these models, are cost conscious and want more flexibility & control. In many organizations and even products, indeed, both model groups will co-exist, depending on the specific task.

Prediction 3: There will be a Gradual Transition to Smaller, Domain-Specific Models

Figure: General-purpose LFMs may be considered hedgehogs and smaller models a group of foxes (Source: Hedgehog Digital)

As the name suggests, the current state-of-the-art LFMs are rather big, often with billions or even trillions of parameters. Indeed, the most amazing feature of transformer-based LFMs is their ability to scale nicely in size as models become more complex and then surpass expectations on reasoning and output. And we did come a long way from OpenAI’s 2018-released GPT with 100M+ parameters to GPT-4’s reported 170T parameters. Inherently, general-purpose LFMs can perform a wide range of tasks. Yet, the sheer scale of these models leads to difficulties in handling them and does produce high IT infrastructure-related costs. While smaller specialized and highly fine-tuned models do offset some of the disadvantages of LFMs, in practice, choosing between a general purpose LFMs or a more specialized smaller model is often a difficult decision for businesses.

A possible way to ease this decision are methods generally referred to as model compression. For instance, Google Research came up with a technique called pruning and Hugging Face developed distillation. These methods are used to reduce the size of LFMs without significantly impacting their performance. As such, while the amazing feature of LFMs is indeed their scale and their wide range of abilities, we may either use model compression to reduce LFMs in size, or we might start with LFMs and then derive highly fine-tuned models with much smaller size for specific tasks from them. Depending on how much progress can be made with methods such as model compression, we believe that it might also be possible to see a shift from large, general-purpose models to specifically-developed smaller, domain-specific models for a wide range of use cases. Smaller models will be more cost-effective and accurate for their intended task. Even in this case, LFMs could be helpful as LFMs can be used as agents to train more specialized models.

A more recent example on where the journey may lead us is the work Microsoft has done with Orca, a 13B parameter model (instead of 100B+) which learns to imitate the reasoning process of any LFM, not just to imitate the style of LFMs. It is, as such, a completely new, rather small model but trained by existing LFMs (such as GPT). Orca is on par performance-wise with several LFMs on multiple widely accepted benchmarks.

Prediction 4: Model Proliferation Complicates Selection for Businesses

The burgeoning number of LFMs — evidenced by the explosive growth of pre-trained models on Hugging Face’s platform, from 50K to over 200K within a year — poses selection challenges for businesses. When choosing a model to build on top of and deploy in production, businesses need to consider factors such as the size of the model, the type of data that it was trained on, and the accuracy and reliability of the results that it produces. Indeed, there have been a wide range of academic examinations (e.g., SAT, GRE, BBH) used to evaluate models. More AI-specific tests include tests like perplexity score, BLEU score, and FastChat. Then there are also tests specifically checking for toxic language, such as ToxiGen.

Figure: FastChat’s Chatbot Arena is a widely-used benchmark to rank LLMs (Source: https://lmsys.org/blog/2023-05-03-arena/)

However, what is really lacking are context-specific tests. The very nature of it forces companies to develop their own tests in many cases. Now, there have been attempts at solving some of the above-mentioned problems such as Anyscale’s Aviary but more work is needed to satisfy the needs of enterprise users for single models. Furthermore, the task becomes even more daunting when it comes to deploying various models in parallel or in sequence within a workflow, which might create the need for an orchestration layer.

Yet, the work does not stop here. In fact, AI models in production, eventually, are no different from other software systems, thus requiring ongoing monitoring and evaluation. That is especially true because of the learning nature of AI systems which make them update consistently.

Prediction 5: Cost Reduction via Competition, Model Specialization and Advanced Training Methods

The fastest way for an enterprise to embed AI in its products and services available to consumers now have become the so-called plug-ins which companies such as OpenAI and Microsoft have developed. Via plug-ins businesses can connect their service to the platforms / chatbots of these large companies in a plug-and-play manner. This will continue to be a viable way.

Besides, APIs have been heavily used, especially for LFMs in the domain of LLMs, such as the ones developed and maintained by OpenAI, Anthropic and Cohere. As the selection of LFMs only increases, the space becomes more competitive. Additionally, it is expected that developers of closed-source LFMs will optimize their own models and IT infrastructure usage moving forward. As such, the price of API calls supposedly should go down in the future. Some methods these companies could use to lower costs are in new special-purpose hardware (such as AWS Trainium).

Figure: AWS Trainium is an AI accelerator that AWS purpose-built for deep learning training of 100B+ parameter models (Source: Data Center Dynamics)

Yet, what about the open-source models? Naturally, as the open-source pre-trained LFMs become better, companies will want to leverage proprietary data pools and train & run their own models themselves. The training part (i.e., fine-tuning) can further be eased by new training methods (such as QLoRA). MLOps and LLMOps companies meanwhile will develop tools to make picking the cheapest hardware setting to run models and scaling them an easier undertaking. Additionally, as these AI models become smaller and more specialized, the cost of training and deploying them will decrease naturally.

These five predictions offer a glimpse into the possible future of LFMs. With the current productivity and pace of technology evolution, we believe that the space will remain dynamic for the months ahead. Overall, the direction is clear: all the above developments will make world-class AI technology more affordable and speed up the roll-out for businesses and individuals alike.

Disclaimer: All opinions stated in this article reflect my personal views only.

--

--

Alexander Kremer
Picus Capital

Global investor based in China with a proven track record as a business leader; 10+ years of work experience in VC, Tech and Management Consulting