The Future of LLM Agents: A Prediction

5 min readSep 9, 2024

After experimenting with LLM agent development, I want to share my thoughts on how this technology might evolve over the next few years. While LLMs are impressive in their current state, we are approaching some limitations, particularly in how useful these models can be at specific sizes. For those new to LLM agents in their current form, agents are software programs that use LLM models as the user interface to execute actions when prompted. In practice, an LLM can respond to input with structured output that agent software can interpret and use to activate internal scripts. These actions include reading a PDF, querying a database, adjusting the temperature of a building, changing volume, interacting with an API, or other tasks.

In essence, the user experience would resemble an advanced voice assistant like Alexa but be much more dynamic. Command prompts would be based on natural language and not restricted to a pre-programmed list of phrases.

Two Routes of User Input

A decision layer will determine how to handle user input. This “analyzer” would decide whether the user’s request involves a direct action that the LLM can’t handle alone (such as precise calculations, real-time data retrieval, or API interaction) and then route it to a finetuned model instead of a general one for the standard generative response. If the request requires an action, there are two methods of generating structured output.

Method 1: Prompt Engineering

Prompt engineering involves embedding specific instructions into your prompts. When prompting a response, instructions are added for the LLM to follow. For example, prompts could include instructions like, “only respond with ‘yes’ or ‘no’” or “format your response as [x] for a checked box and [o] for an unchecked box.” This method essentially hard-codes how the LLM should respond directly in the code of the software, pushing it to follow a specific pattern for output.

The main advantage of prompt engineering is its flexibility, particularly for new or temporary actions or when using LLMs via an API. However, this approach has limitations. First, it uses more of the context window, a finite resource in any LLM’s processing. While less of an issue for large models, their cost and privacy problems will push people to want their agents localized. This is where prompt engineering can become an issue because smaller, lighter models have less of a context window to work with. Additionally, this prompt-engineered method is more prone to hallucinations since you are relying on the instructions being followed very closely by the model instead of seeing the response to the question as the “true” response.

Method 2: Fine-Tuning

Fine-tuning involves training a LoRA model on specific datasets to teach it how to respond to certain inputs. A LoRA (Low-Rank Adaptation) is a method for efficiently fine-tuning large language models (LLMs). Instead of updating all model parameters, LoRA modifies only a small, low-rank subset of the parameters. This reduces computational costs and memory requirements while preserving the model’s performance.

In this method, you create a dataset where inputs and outputs are structured as actions. For example:

{user: 'Please turn the volume up 3', agent: '{action: "adjust_volume", method: "increase", value: "3"}'}

In this example, the model learns how to structure its output in a way that software can interpret for specific tasks. Unlike prompt engineering, fine-tuning embeds this knowledge within the model itself, making it much more efficient at handling complex commands without consuming the context window.

Fine-tuned models also provide greater accuracy and are less prone to hallucinations. For smaller models, which can be run locally, fine-tuning is a lightweight, cost-effective way to handle more complex commands. However, fine-tuned models require regular updates, especially when new functions are needed. In practice, agents will likely use both methods: prompt engineering for testing or short-term actions and fine-tuning for stable, longer-term commands.

Integrating Prompt Engineering into User Interaction

Prompt engineering will also play a key role in how the output of action scripts is communicated back to users. After a function is executed — whether it’s successfully completed, returns an error, or gathers information — the data needs to be translated back into natural language. This will likely involve a second LLM call (if the user response isn’t hard-coded into the function) that uses prompt engineering to format the output so the user can easily understand. For example:

First LLM call: Fine-tuned to identify which action to take and return structured output (e.g., “{action: ‘adjust volume’, method: ‘increase’, value: ‘3’}”).
Second LLM call: Prompt-engineered to explain the result to the user in natural language (e.g., “The volume has been increased by 3 levels”).

Limitations of LLMs

LLMs are powerful but come with some notable limitations. Running large LLMs is expensive in terms of both computational resources and energy consumption. They are also not the best solution for every task, particularly when precision is required. One widely discussed limitation of LLMs is their inability to count accurately. For example, when asked, “How many R’s are in the word ‘Raspberry’?” LLMs frequently respond with the incorrect answer of “Two” instead of “Three.”

This is because LLMs process information by tokenizing words into numerical sequences, which are associated with meaning, not the structure of the word itself. In this case, ‘Raspberry’ could be tokenized into a sequence like {1114589}, which bears no relation to the actual letters in the word. LLMs are excellent at identifying patterns of meaning but not at handling concrete, detail-oriented tasks like counting or arithmetic.

As a result, building beneficial agents will require supplementing LLMs with external tools that can handle these tasks with precision. For example, external scripts or APIs that integrate seamlessly with the LLM’s outputs could handle simple counting or arithmetic.

Building the Future: Hybrid Agents

The future of agent development will likely involve hybrid approaches that combine LLMs for natural language understanding with more deterministic systems to handle precision tasks. We may see agents that learn new functions in the background, using rented GPUs from the cloud to fine-tune new models as needed. Alternatively, agents could be highly specialized tailored experiences that, while not truly intelligent, create the end-user experience of a robot assistant. They could even be hardware devices operating like network-attached storage (NAS), living in a user’s home or office, running in the background and interacting across multiple devices while keeping user data private.

Users will likely be able to write action scripts, attempt to have agents write them, or download functions from a central open-source hub to extend their functionality. This hub would allow users to rate and share functions, with agents gradually learning and adopting new scripts based on user feedback. Eventually, this could evolve into a system where agents autonomously select the best-rated functions for specific user requests. The possibilities of what once felt like si-fi are slowly becoming tangible in a way grounded in reality. It is exciting to see the direction agents will go in over the coming years.

The Future of LLM Agents: A Prediction

Two Routes of User Input

Integrating Prompt Engineering into User Interaction

Limitations of LLMs

Building the Future: Hybrid Agents

Written by Spencerjamespark