The Emerging Economy of LLMs

Part 1 / Gilad Barkan

Gilad Barkan
Wix Engineering
9 min readOct 8, 2024

--

We’re witnessing a new world wide emerging economy — the LLM economy!

Generated by Bing Image Creator

Introduction

Throughout the entire history of humankind, all social and economic revolutions were kickstarted by disruptive technologies. The invention of the plow and irrigation systems, 12,000 years ago, led to the Agricultural Revolution. The invention of the printing press by Johannes Gutenberg in the 15th Century led to the Protestant Reformation and rescued all of Europe out of the Middle Ages and into the Renaissance era. The Steam Engine, in the 18th Century, perfected by James Watt, led to the Industrial Revolution. The internet has revolutionized communication, commerce, and information access, turning the entire world into a small village, and smartphones have changed the way people interact with the world.

Now it’s the AI revolution era. In particular, Large Language Models (LLMs) represent a huge leap forward in technology, prompting profound economic implications at both the macro and micro levels. From reshaping global markets to fostering new forms of currency, LLMs are sculpting a novel economic landscape.

The reason AI and LLMs are transforming industries and shaping new economic paradigms is simple and fundamental: this technology is able to automate routine as well as complex tasks that require human intelligence, enhance cognitive functions and decision-making processes, and boost productivity across various sectors.

This leads to cost reductions, increased efficiency and development of new products, and the ability to allocate human resources to more creative and strategic endeavors. This technology disrupts all domains, from healthcare to finance to customer service. The advent of LLMs is spawning entirely new markets. AI-driven services such as content generation and conversational assistants will soon become mainstream, opening up avenues for startups and established companies alike.

About this series

This is a two-post series. In this first post, we’ll uncover the macro level implications of this new emerging economy, revealing the macroeconomic building blocks and forces that are rising as we speak, while in the next post we’ll roll up our sleeves and dive deep into the technical parts that drive LLMs and form the currency of this new LLM economy — the tokens.

We believe that only after really understanding the bits and bytes of this disruptive technology, one can really understand and realize the engine behind the scenes that drives this new arising global economy. We hope these two posts will indeed compile for you both: the macro level on one hand and deep understanding of the internal mechanics of LLMs on the other hand, so you’ll be much more knowledgeable about the revolution happening right now.

Enjoy!

Why now? Why is language correlated with human intelligence?

AI wasn’t ignited in November 2022 with the appearance of ChatGPT, but way way before. I myself already developed Machine Learning classification models in 1999. Formally, AI — Artificial Intelligence — was born already in 1950, created by Alan Turing, the father of theoretical Computer Science and the one who is responsible for deciphering the Nazis’ Enigma code, which in turn enabled the Allies to defeat the Nazis, hence shortening WWII and saving the lives of millions of people.

Turing formalized for the first time a definition of intelligence, by demonstrating the so-called Turing Test that is illustrated below, and coupling the definition of human intelligence together with today’s chatbot experience. As can be seen in Figure 1 below, a human evaluator engages in natural language conversations with both a machine and a human without knowing which is which, and if the evaluator cannot reliably distinguish the machine from the human, the machine is considered to have passed the test. Amazingly enough, it took 72 years of slowly crawling AI research until ChatGPT actually simulated exactly the same, passed the Turing Test and ignited the current big-bang of true artificial intelligence.

Figure 1. Turing Test. Source: Wikipedia. A human evaluator engages in natural language conversations with both a machine and a human without knowing which is which, and if the evaluator cannot reliably distinguish the machine from the human, the machine is considered to have passed the test.

The interesting question is: why is human intelligence mostly correlated with language, and not with vision for example? 70% of our brain’s neurons are assigned to vision; OpenAI has already launched their pioneering image generation Dall-E even before ChatGPT and it didn’t have such an effect. Why is that so?

The Evolution of Language

The answer is simple. Development of human language was the leap frog of human dominance on earth. More specifically, as Yuval Noah Harari claims in his best seller book ״A Brief History of Mankind״, it was the development of human gossip and the ability to chat about abstract concepts that made humans diverge from other species. Complex communication like gossip can only be achieved with a proper, shared language.

Human language has evolved from archaic cave signs to more efficient constructs of alphabets that, together with grammar rules, create languages whose vocabulary size spans thousands of words. In the digital age, vocabularies have expanded even further (some people may argue the opposite) with the use of emojis. Now, with the rise of GenAI, tokens have become the latest cornerstone in the evolution of language. These transformations underscore the remarkable journey of human language from primitive symbols to complex digital representations.

In the second post of this series we’ll deep dive into the guts of LLMs, more specifically into tokens. But before that, let’s understand better the new emerging economic forces of the LLM realm.

The LLM Economy Forces

The AI Giants at War

Karl Marx and Friedrich Engels claimed in their manifest of dialectical materialism that whoever dominates the production means rules all. All the tech giants immediately understood that AI is the future means of production, and the war for control over the means of production has been declared.

This battle to dominate the LLM market is fierce, with major players including the pioneering OpenAI, and the tech giants of Google, Microsoft, and Facebook at the forefront, but with additional new players like the French Mistral, the Israeli AI21, and American Elon Musk’s xAI and Anthropic. Each is vying to develop the most capable and efficient models, leading to a technological arms race. The LLM industry is exponentially growing with billions of dollars of investments. For example, the LLM company Anthropic raised a total investment of $4.5B, funded by 43 different investors, including key angels like Eric Schmidt, as well as the giant companies of Amazon, Google & Microsoft.

The Scarce Resource: GPUs

In the same way mining of bitcoins requires an exponential amount of computational resources, training LLMs and spitting tokens out of them require huge amounts of computation power. This has fueled the urgent search for more energy sources, as evidenced by Microsoft’s recent investment in nuclear energy.
Furthermore, Graphics Processing Units (GPUs), the core hardware driving the deep neural networks behind LLMs, have become a scarce and expensive resource, exacerbating the competitive tension:

  • Manufacturing Constraints: The production capacity for GPUs is limited, driven by factors like semiconductor shortages, complex manufacturing processes and architectural limits that saturated Moore’s law.
  • Strategic Reserves: The bigger companies secure large quantities of GPUs to ensure their AI operations run smoothly, creating a competitive bottleneck where smaller firms may struggle to access this critical resource.
  • Alternative Solutions: In response to Nvidia’s GPUs scarcity, companies are exploring different options to mitigate the issue, such as GPU sharing and other alternatives. For example, Google developed its own AI chip — Tensor Processing Unit (TPU), and new companies like Groq and SambaNova have developed their own optimized AI chips. These innovations aim to provide the necessary computational power while reducing dependence on traditional GPUs. In addition to the hardware based innovations and advancements, new alternative models architectures start to pop-up, to overcome current bottlenecks and better exploit existing resources. Good examples for that are speculative decoding to optimize inference, and structured state space models (SSMs) like Mamba and Mixture-of-Experts (MoE).

Tokens as the New Currency of the LLM Economy

Tokens are the new currency of the emerging AI-driven economy. Just as money facilitates transactions in traditional economies, tokens are the currency driving the LLM economy.
But what are these tokens?

What are Tokens?

While in the next post we’ll dive deep into the technical bits of tokens, here we’ll understand what they are, and review their high-level aspects, mostly in regards to business and finance.
While characters and words are the fundamental units of human language, for LLMs it’s something in-between, named tokens. Tokens are text “chunks” that represent commonly occurring sequences of characters in the large language training dataset. A token can be a single character, a fraction of a word (sub-words), or an entire word. For example, ‘?’, ‘os-car’ (the word ‘oscar’ split into two tokens ‘os’ & ‘car’) and ‘people’, respectively.

Since text is the LLMs’ material, tokens, as the smallest unit of text, serve as both the Medium of Exchange that facilitate transactions, as well as the Unit of Account that serves as a standard numerical unit for measuring the value of the LLM providers’ services.

LLM companies invest hundreds of millions of dollars in developing and training LLMs. The eventual materialization of these LLMs are tokens. Their quality, how fast they’re been generated, and how costly it is to generate them, define the LLMs triad performance metrics: quality, latency and cost.

Pricing Models

The cost of using these models is often based on the number of tokens processed, drawing a direct parallel to conventional monetary transactions. Different LLM providers adopt varied pricing strategies based on token usage. Most LLM providers charge different rates for input (prompt) and output (completion) tokens, reflecting the computational effort as well as the energy to produce them and the value provided.

If you’re a blogger and use LLMs to help you write a post of 3,000 words (~4,000 tokens) for example, then you’ll pay some cents, but if you run agentic applications that may call ~10 calls per user’s interaction, or if you’re a company like Wix.com with 250M+ users that embed AI within its products in every place including an AI Website Builder, then you can imagine the monthly bill to AI services.

The competition among LLM providers is fierce. Add to this the increasing footprint and performance of open source & open weight models like Llama-3.1, you then get a continuous decrease in price. As an example, OpenAI reduced its GPT-4 price by ~80% within the last year and a half (see Figure-2 below). This continuous trend allows companies in turn to build a more extensive portfolio of AI based products.

There are two main resources we recommend looking at to compare different vendors’ cost per token: Artificial Analysis (recommended as well by Meta’s recent Llama 3.1 release) and HuggingFace’s that is very convenient for comparison.

Figure 2— taken from here

Context Window

Recently, another metric was added to the triad mentioned above — the context window, meaning the length of input prompt. Why is this so important?
- First, to show you have the biggest! The bigger the context window I have the stronger I am.
- Second, although still limited, a bigger context window potentially enables new functionalities, like a RAG-less Q&A application.
- Third, the new emerging alternative technologies to transformers, like Mamba, are strong signs of increasing demand for longer context windows. These technologies solve transformers’ quadratic complexity Achilles’ heel, which prevents them from supporting huge context windows.
- Fourth, and the most important one, a bigger context window allows users to pump and prompt in more tokens, which in turn leads, as we’ve seen before, to a higher revenue for the companies.

Summary

The emergence of the disruptive technology of LLMs has kickstarted a new social revolution that facilitates human-AI communication and interaction. As LLM capabilities grow, a whole economy will grow around it, accelerating the development of new markets and capabilities that have never been seen before.

While this first post in the series gave us a high level overview of this emergent new economy around LLMs and its financial and social implications, in the next post we’ll roll up our sleeves and deep dive into the technical parts that drive LLMs and form the currency of this new LLM economy — the tokens.

--

--

Wix Engineering
Wix Engineering

Published in Wix Engineering

Architecture, scaling, mobile and web development, management and more, this publication aggregates blog posts written by our very own Wix engineers. Visit our official blog here: https://www.wix.engineering/