Navigating the Next Frontier

The evolving landscape of LLM(ops) and the winning strategies

Sebastian Jorna
7 min readNov 20, 2023

In my preceding blog posts [1], [2], [3], I’ve discussed how software is eating the world, and AI (LLMs) have the potential to be better at using software than most humans. To realize this future, a new breed of software infrastructure and developer tools, termed LLMops, is needed. Tidalflow has been at the forefront of this movement, bridging the gap between deterministic software and stochastic LLMs.

Tidalflow.ai

The key question in the LLMops landscape is where value accumulates: the model, tooling, or application, and whether incumbents or startups will lead. A leaked internal Google document in mid-2023 starkly stated:

We have no moat and neither does OpenAI

Current LLM models, despite their high training and operational costs, face rapid obsolescence as newer, more efficient models emerge. As a rule of thumb, no model that is used today will still be used in a year. While these models cost millions to train and run, they have a very short shelf-life on their own. More on that later. Given the generalistic knowledge of these LLMs, the performance differences and hence switching costs are relatively low. The competition among LLMs will eventually hinge on cost and latency, with costs already plummeting — GPT-3’s API cost, for instance, has reduced 10x since March 2023.

Despite this ephemeral nature of the individual models, an ecosystem centered around LLMs can be highly valuable, provided a robust moat is built around it. The potential of these ecosystems in capturing user attention is evident from ChatGPT’s staggering 1.7 billion website visits in October 2023 alone, with an average session duration of 8 minutes. This translates to approximately 26,000 human years spent on ChatGPT in just that month! And the projections for November are even higher.

What became clear on the OpenAI’s dev on Nov 6th is that they plan to capture this ecosystem by vertically integrating and developing the necessary LLMops tooling in-house. Microsoft shared similar advances during their Ignite conference a week later. I expect Google, Apple, Meta (opensource?), and the other LLM providers to follow suit.

OpenAI tackling the LLMops space

Recent developments made it clear that many of the required LLM “crutches” are being built by the tech incumbents to fortify their LLM-ecosystem moats. However Bitter Lessons from the past 70 years of machine learning research suggest that many of these aids may become redundant as computing power grows. Let’s have a look at how these models benefit from a multitude of exponential improvements. LLMs will get better and cheaper. Fast.

The first factor driving LLM improvement is the increase in absolute computing power driven by Moore’s law. For instance, NVIDIA’s 2024 H200s are 18x faster at inference on GPT-3 vs the prior generation A100 the model was trained on.

More importantly, the market is about to be flooded by increased compute. GPT-4 was trained on 25k A100s in about 90 days. That’s 3e25 FLOPs, which leaves a lot of room for improvement.

Estimates are that a strong team would only need about 7k H200s running 90 days to train GPT-4 (Compute-wise). NVIDIA however, is forecasted to sell over 3 million GPUs next year, about 3x their 2023 sales of about 1 million H100s. And that does not even include the ramp-ups from AMD, Google, and Microsoft.

“Microsoft is currently conducting the largest infrastructure buildout that humanity has ever seen. While that may seem like hyperbole, look at the annual spend of mega projects such as nationwide rail networks, dams, or even space programs such as the Apollo moon landings, and they all pale in comparison to the >$50 billion annual spend on datacenters Microsoft has penned in for 2024 and beyond. This infrastructure buildout is aimed squarely at accelerating the path to AGI and bringing the intelligence of generative AI to every facet of life from productivity applications to leisure.” Dylan Patel

Besides the sheer increase in computing power, another trend contributing to LLM advancement is the ability to extract more performance from the same computational resources and code. Modular, the company behind Mojo has made notable strides in this area with their Python superset, Mojo, running 35,000 times faster than its original version. They are revolutionizing compiler and runtime technologies, creating a highly efficient unified inference engine.

Mojo — Python superset running 35,000x quicker than the original
Modular revolutionizing compiler and runtime technologies to create the fastest unified inference engine

A third trend challenges the prevailing belief that larger models are inherently better. Recent advancements have enabled significant size reductions in top reasoning LLMs — by up to 97.5% — while maintaining performance levels. Much of this progress can be credited to insights from the Chinchilla paper, demonstrating that models trained with fewer parameters but more tokens can be more effective.

This brings me to the final point, namely the role of these tokens in their job as training data. This has been a major bottleneck because it is not just the quantity but also the quality of the training data that has an impact on the overall model performance. As OpenAI founder Ilya Sutskyer said in early November:

The most near-term limit to scaling is obviously data. Without going into the details, I’ll just say that the data limit can be overcome

We’re not yet fully leveraging the vast pool of multimodal data, and today’s robust models have the potential to autonomously generate considerable amounts of high-quality data for the next generation of models.

When we combine these factors — the exponential growth in computing power, the increased efficiency of existing infrastructure, the reduced computational needs of newer models, and the resolution of the data bottleneck — the landscape appears transformative. The current dependencies on LLMops “crutches” may soon become obsolete.

Plan for a world where the marginal cost of intelligence is close to zero. These models will not just be more capable, cheaper and steerable, they will run at human-level inference across different modalities.

LLMs, while representing snapshots in time of compressed information, don’t encapsulate everything. Aspects such as internal and proprietary knowledge bases, dynamic data like websites or current time-series data, specific computational tools (which could include other models), and third-party tools with unique business logic are expected to remain external to the main LLM framework for the foreseeable future.

Rolling these trends forward, what are the problems now within our reach, which were previously unsolvable with past tools? These advances are effectively making non-deterministic computation a reality, and rendering the handling of unstructured data a “non-issue”. Additionally, it has opened the door to high-quality computer vision projects, which were once prohibitive due to requiring multi-million budgets.
Perhaps most significantly:

LLMs have resolved the paradoxical trade-off between scalability and customization

We are now entering an era of scalable hyper-personalization and freely accessible domain expertise across various modalities. This marks a significant leap forward in how we approach and implement technological solutions.

If you want to build the next Uber-like business for the AI age. Seek opportunities in markets with large user bases, where individual preferences vary greatly but are currently served with one-size-fits-all solutions, except for a small premium niche. The key is to identify where the bulk of the cost involves customized knowledge work by professionals, rather than specialized hardware, and leverage AI to democratize and personalize these services at scale.

You can look at it as a steep downwards shift in the supply curve of personalised solutions (lower cost basis). On top of this, the new supply curve is way more elastic as result of the scalable setup of these solutions. While the new market price is lower, the increase in demand more than makes up for it, significantly increasing the entire market! Look for huge potential markets where the current supply of customised solutions is inelastic and starts at a high initial price point.

Reach-out if you’re as excited as we are in unleashing a tidalflow of democratisation. 🌊

Tidalflow — New horizons

--

--