LLMs: From Information to Knowledge to Action

The marginal cost of Intelligence is dropping to zero

Sebastian Jorna
9 min readJan 10, 2023

In 2017, transformer architecture revolutionized the field of machine learning by introducing a way to compress vast amounts of information into knowledge at scale. Large language models (LLMs) like GPT-3 have since been able to condense the entire written internet into giant trees of knowledge, with each branch representing a different aspect of human knowledge. However, the question remains: how do we effectively turn this condensed knowledge into action?

Midjourney — Compressing all information into knowledge

Chapter One: LLMs as Knowledge Tree

A useful analogy for understanding the shift in machine learning models before and after the transformer architecture is to look at it as the metamorphoses of broomsticks to knowledge trees. Prior to the transformer architecture, most machine learning models were comparable to broomsticks, narrow and specialized. These models were based on feedforward neural networks, which are organized into a series of layers. In order to make a prediction, the input must pass through each of these layers sequentially. This makes it difficult to find dispersed contextual relationships across (diverse) data. For example, it would be difficult to follow a character through a book. Another large downside is the computational efficiency of these models. Because data needs to flow through all the layers sequentially, you can’t split the model up for training. (Sequential = Bad)

Midjourney — Knowledge tree in the style of Tron

In contrast, transformer-based architectures are built around the idea of self-attention, which allows the model to weigh the importance of different parts of the input when making a prediction. For example, understanding how a few clues in the first Harry Potter book relate to some “plot twists” in the final ark. The other breakthrough is that this self-attention mechanism is parallelizable, meaning that it can be calculated for different parts of the input simultaneously. By breaking the model up into small, independent chunks that can be calculated in parallel, transformer-based architectures can make predictions much more efficiently. Additionally, this parallelism can be easily mapped to distributed computing resources i.e. GPUs, to maximize available compute. (Parallel = Good)

This gave LLMs (Transformer based neural nets) the potential to condense massive amounts of information into large knowledge trees. And it didn’t take long after the 2017 paper, attention is all you need, which introduced transformers, that we saw all types of experiments of training a bigger tree and testing what cool capabilities would emerge out of it.
Spoiler alert, we still haven’t reached the point where more scale (A larger tree) doesn’t bring better performance across all kinds of use cases. The race is on.. both from a private (Microsoft, Google, Meta,..) and the open-source side.

Increase of LLM model sizes

My prediction is that this won’t be a winner-takes-all market, but rather an oligopoly, as significant infrastructure is needed to train models of this scale, limiting the number of players. Furthermore, the model architecture itself is not sophisticated enough to yield a lasting competitive advantage. As more players, both from the private and open-source sectors, train on the same data, the trees are likely to converge to similar structures. Thank god for the open-source alternatives, which will become the lower-bound price setter and ultimately lead to commoditization. A world where:

The marginal cost of intelligence is dropping to zero

Chapter Two: Growing Branches

One major limitation of the LLMs is that once the tree is trained, it stands there and does not evolve. One way you can increase the performance of the tree without additional training is by being clear in how you “prompt” your question (See below, going from GPT to GPT-prompted).

Quality ratings of model outputs on a 1–7 scale (y-axis), for various model sizes (x-axis), on prompts submitted to InstructGPT models on OpenAI’s API. InstructGPT outputs are given much higher scores by labellers than outputs from GPT-3 with a few-shot prompt and without, as well as models fine-tuned with supervised learning.

However, there are a number of reasons you want to go beyond prompting and actively evolve the knowledge tree. For one, our world does not stand still and we continuously add and create new information. Secondly, while the tree is likely trained on tons of publicly available information, many real-world use cases rely, on proprietary databases. Luckily we have found different ways to grow additional branches on the tree. Officially called, fine-tuning.

Supervised fine-tuning of LLMs refers to adjusting the pre-trained model’s hyperparameters. This is done by training on a smaller, relevant set of labelled data where the input and output are both provided. You then make the model predict the output from the input until it performs well on the task. This data usually involves hand-crafted examples and consequently doesn’t scale well.

OpenAI came up with a clever way to lessen the difficulty of scaling the supervised fine-tuning approach. This was key for ChatGPT’s performance. They did this by finding a way to incorporate Reinforcement- Learning (RL). RL is the crucial component that allowed DeepMind to train AIs like AlphaGo, which became grandmasters in games by simply playing against itself. The problem with RL is that you need an environment where you can give the AI rewards to help it optimize. Easy in a game, but very difficult in the real world. OpenAI managed to bridge these worlds by leveraging the LLMs capability to create content and human’s strength to intuitively identify good vs poor quality. For their InstructGPT, they collect/create a small set of supervised learning data and consequently ask the LLM to create several outputs. The human then manually labels them from best to worst. OpenAI subsequently trains a “reward model” that mimics the rating process. This missing link removed a major bottleneck. Now we have, scalable and dynamic feedback loops to grow new branches and generally increase the quality of the knowledge tree’s output.

Late Oct-2022 an interesting paper was published (In-context Reinforcement Learning with Algorithm Distillation) that showed a novel approach to distil RL capabilities into LLMs. You essentially take RL training histories and have the LLM learn a causal sequence model to predict the RL’s actions given its learning histories as context.
This looks as follows in our knowledge tree analogy. Instead of growing new branches, this approach gives the tree capabilities similar to Harry Potter's Whomping Willow. It can move its branches based on the context of the situation.

No prompting, no finetuning. A single transformer collects its own data and maximizes rewards on new tasks. Offline meta-RL..

JK, it’s more of a crawl today. Nothing to worry about Hermoine 😅

Chapter Three — Platonic Dialogue

We have now covered how to create the knowledge tree, grow domain-specific branches in a scalable way, as well as an initial approach for the branches to move based on context.

So, how do we interact with this tree? Luckily it is very easy for outsiders to interact with these knowledge trees. Keeping the metaphor, if we’re all little birds, we just need to sit on the tree and by speaking to it we can tap into the full knowledge. Furthermore, some companies are building birdhouses in the tree to give you an even better UI/UX when interacting with it. Moreover, those birdhouses could even give you access to proprietary, fine-tuned branches. Overall, question in, one-shot answer out.

While we often expect a one-shot response from the LLM, there is potential for a more iterative approach of reasoning. This is similar to Plato who goes on succinct “internal” tangents of thoughts and answers to solve a specific question. Likewise, we also break down difficult math questions into sub-questions or tasks to solve the overall puzzle.

Midjourney — Plato sitting on a branch of the knowledge tree

In October 2022, the Google Brain team introduced the ReAct framework as described in the following paper, ReAct: Synergizing Reasoning and Acting in Language Models. This framework of Reasoning and Acting allows for chained questions and answers. The LLM can now break down complex problems into smaller subproblems, just as a human would. The resulting increased accuracy of the LLM across benchmarks is quite profound.

Using the ReAct framework to find a debate topic for Jason Calacanis and David Sachs for the current week’s All-in podcast. I only prompted the initial question and the LLM broke it down into a chain of different thoughts, actions and observations

Chapter four — Reeling in the software cloud

If you zoom out of the knowledge tree, you start to notice that many of the birds are flying up to the sky, into a massive massive software cloud that hovers above the tree. While we humans all have our own little knowledge trees in our heads, we have built an armada of software to supercharge it. All the way from calculators to Google and any other software imaginable. In the last +10y we have seen a massive migration of these software tools from on-prem to the cloud.

Why do we care? Because LLMs with the help of some agents could shoot chains from their birdhouses into this cloud. Allowing the knowledge tree to do what we do, but with vastly more background knowledge. This is where LLMs become really really interesting! We give them access to a massive software toolset that we have been building over the last decennia, as well as direct access to cloud-based databases.

Shoot for the stars, but aim at the clouds!

Putting it all together

LLMs have condensed most of the internet’s information into knowledge (trees). We can fine-tune those trees to grow and distil additional information into further knowledge branches. Those branches will potentially be able to move based on the context they are in (Meta-RL).
We can sit and build birdhouses on these branches. Not just for one-shot answers, but to let the LLM have platonic dialogues with itself (ReAct framework), breaking down questions and tasks into solvable sub-modules. Most importantly, as the LLM is chainging its thoughts, it will be able to reason what 3rd party software in the cloud will help it to solve a respective sub-module. LangChain is a fantastic Github repository that is building a toolset to bring us from the normal knowledge tree to one where we can chain thoughts and shoot hooks into the software clouds.

What does the future look like?

As David Frieberg said:

Humans are transitioning from creators to narrators

Humans will increasingly outsource the actual content creation and execution. Agent-enabled LLMs will be able to take in high-level, human-prompted goals and tasks in natural language. They will consequently break them down into sub-tasks, and in symbiosis with other software tools complete them.

What is still missing? A key catalyst for the widespread adoption of these technologies will likely be the development of an “autopilot for computers,” allowing for the automation of tasks performed through the use of a mouse, keyboard, and pixels. The Nov-2022 paper, A data-driven approach for learning to control computers is making interesting strides in this direction. The powerful aspect of this route is that we don’t need to create a shadow software world for AIs, but rather have a single unified interface that both humans and AIs can access. This same reasoning has also shaped Tesla’s decision to build a humanoid robot vs any other shape. Our world has been built for humans, both physical and digital. Let’s not reinvent the wheel and make sure we can still access, understand, and interact with that what the AI can.

Bonus: interesting business opportunities

The main information condensation into knowledge will become commoditized as the war between private and open-source LLMs intensifies.

Significant, and sustained value can likely be generated by horizontal mid-layer B2B applications. Those that enable companies to condense their proprietary information. Another strong angle are applications that leverage agents to hook into the software clouds to perfect specific workstreams that involve longer and more complex sub-routines.
Combining both, I’m particularly interested in horizontal implementations that supercharge software users into power-users by giving them the gift of narration. This in particular could unlock the trapped value between the “theoretical value-add” of software solutions vs the “actual value-add” experienced by the average user.

Cool projects:

Browse the web and perform certain tasks:

Repo-level code assistant

Using NLP narrations to generate instant SQL code to search Twitter:

Quick Chatbot trained on help centre docs:

Chatbot that provides Q&A style answers to all your questions re specific GitHub repositories

A new way to interact with APIs:

GPT-3 in Google Sheets!

Text to slides:

Youtube video-to-text summarization:

--

--