Breaking the LLM’s Token Limit: Introducing the Modular AI Systems Architecture

Amir Ghasemi
6 min readOct 31, 2023

--

Image generated with Dall-E 3, showing an outfit suggester robot

Imagine asking your smartphone: “What should I wear today?” and receiving a thoughtful suggestion based on the current weather and the time of day. This seemingly simple interaction actually hides a complex system of decision-making and data management behind the scenes.

Introduction

Current AI systems and LLMs, especially conversational models like GPT3.5 or GPT-4, can handle up to 32,000 tokens (Claude 2 has a 100k token limitation) so far in a single conversation. This limitation can sometimes restrict the ability to process extensive data or manage multifaceted interactions within a single conversation.

In this article, we explore an innovative architecture (That I called Modular AI Systems) that enables artificial intelligence (AI) systems to break this limitation and manage multiple tasks, making such interactions not just possible, but efficient and scalable.

What is the Modular AI Systems?

The Modular AI Systems Architecture is an innovative approach to designing AI systems that focuses on breaking down complex tasks into specialized, independent units known as “Task-Modules” or simply “Modules.” Rather than having a single, generalized AI handle all tasks, this framework employs multiple specialized AIs, each tailored for a specific function or domain.

A central “Task Manager” orchestrates these Modules, interpreting user requests, delegating tasks to the appropriate Modules, and compiling their outputs into a cohesive response. This modular structure allows the system to handle more complex interactions efficiently, breaking free from traditional AI token limitations and providing richer, more context-aware responses to user queries.

Now let’s dive into our new introducing system:

Image: An Overview of the Modular AI System Architecture

The Concept: Task Manager, Modules and Module Managers

1. The Task Manager

The Task Manager is the conductor of our AI orchestra. It listens to the user’s requests, understands which instruments (modules) need to be played, and ensures that the symphony (final response) is harmonious.

Responsibilities:

  • Understand the user’s request.
  • Identify which tasks or modules need to be invoked.
  • Manage the flow of information between different modules.
  • Compile the final response to the user.

2. Task Modules (Modules)

Modules are specialized AI modules, each designed to handle a specific type of task or domain, such as detecting dates, forecasting weather, or suggesting outfits.

Characteristics:

  • Specialized: Expert at a specific task.
  • Parametric: Can handle variations in requests through parameters.
  • Independent: Operates independently of other modules.
Image: Visual representation of Task Manager and Modules interacting

3. Module Managers

A unique feature of this architecture is its inherent scalability. If a particular module requires additional processing capacity due to token limitations, multiple instances of that module can be instantiated. However, to streamline communication and ensure efficient data handling, these multiple instances are managed by a dedicated Module Manager. The Task Manager communicates with this Module Manager, which in turn delegates tasks to and aggregates responses from its associated module instances.

This hierarchical approach ensures that the system remains organized, efficient, and scalable. It allows for the easy addition of processing power where needed without adding undue complexity to the Task Manager’s operations.

Image: Visual representation of Task Manager, Module Manager and Modules interacting

The Modular AI Systems Architecture is an innovative approach to designing AI systems that focuses on breaking down complex tasks into specialized, independent units known as “Task-Modules” or simply “Modules.”

Flow of Interaction

1. User Input

The journey begins with a user asking a question or making a request.

2. Task Manager Takes Charge

The Task Manager analyzes the request, identifying the necessary tasks and orchestrating the interaction between different modules.

3. Engaging Modules

Each module performs its task and communicates its results back to the Task Manager.

4. Crafting the Final Response

The Task Manager compiles the results into a cohesive response, which is then presented to the user.

Image: Diagram showing the flow from User Input to Final Response

A Practical Example: The Outfit Suggester

Let’s explore our architecture with a practical example: back to our beginning example, An AI system that suggests what to wear based on a user’s query, “What should I wear today?”

1. Modules in Play:

  • Date Detector: Identifies the exact date and time.
  • Weather Forecaster: Provides the weather forecast for the detected time.
  • Outfit Suggester: Recommends an outfit based on the time and weather.

2. Journey of the Request:

  • Step 1: The Task Manager identifies that all three modules need to be engaged.
  • Step 2: The Date Detector determines “today” refers to the current date.
  • Step 3: The Weather Forecaster provides the weather outlook for the identified date.
  • Step 4: The Outfit Suggester recommends an outfit based on the provided weather and time data.
# Sudo code for the Task Manager's decision-making process
user_input = "What should I wear today?"

def task_manager(input):
# Analyze input and identify needed modules
required_modules = identify_modules(input)

# Execute modules in order, passing necessary parameters
for module in required_modules:
parameters = extract_parameters(input)
result = module.execute(parameters)
input = manage_output(result) # Update input for next module

# Generate final response
final_response = compile_response(result)
return final_response

output = task_manager(user_input)
Image: Diagram showing the flow for the Outfit Suggester system

Advantages and Considerations

Like any technological advancement, modular AI systems come with their own set of benefits and challenges.

Advantages:

Scalability

One of the primary benefits of the Modular AI Systems Architecture is its ability to efficiently manage more extensive tasks. By breaking down a complex request into smaller, more manageable pieces, the system can process each piece independently. This decentralized approach allows the system to scale effortlessly, handling larger tasks by distributing them across various modules.

Modularity

The architecture’s design emphasizes modularity, which brings flexibility to the system. With this approach, individual modules (or Task-Modules) can be added, removed, or upgraded without causing disruptions or requiring significant changes to the entire system. This modular design ensures that the system remains adaptable to evolving requirements or emerging technologies.

Token Efficiency

Token limitations have been a constraint for large language models. However, with the Modular AI Systems Architecture, data or tokens can be distributed across multiple modules. This distribution means that each module only handles a fraction of the tokens, enabling the system to process larger sets of data without hitting individual module token limits.

Parallel Querying

A significant advantage of the architecture is the capability to query multiple modules concurrently. Instead of sequentially processing tasks, modules can operate in parallel, significantly speeding up response times. This parallelism ensures that the system can deliver faster outputs, especially beneficial when handling multifaceted requests that engage multiple modules.

Considerations:

Complexity

While the modular approach offers numerous advantages, it also introduces layers of complexity. Ensuring smooth interactions between various modules can be challenging. It requires careful orchestration, especially when data needs to flow seamlessly between modules or when tasks are interdependent.

Consistency

With multiple modules processing different parts of a request, there’s a need to ensure that the final output delivered to the user is unified and coherent. Achieving consistency in the user experience becomes crucial. This means that while modules operate independently, their outputs must align in a way that feels seamless to the end user.

Efficiency

Distributing tasks across multiple modules can lead to resource challenges. It’s essential to balance resources effectively between modules, ensuring that no single module becomes a bottleneck. Efficient resource allocation and optimization become vital to ensure timely responses and maintain system performance.

Conclusion

The architecture offers a promising architecture to build intricate AI systems that are modular, scalable, and capable of handling complex, multi-faceted user requests. With thoughtful design and implementation, this could pave the way towards creating intelligent systems that can manage a myriad of tasks, providing users with rich, integrated experiences across various domains and applications.

--

--

Amir Ghasemi

Senior Mobile Engineer at MarleySpoon | 📍 Berlin, Germany