The 7-Factor Enterprise AI App
This is a continuation of my previous article on Multi-Agent RAG Systems (MARS) for Enterprise AI apps.
The following is a proposed methodology called the Seven-Factor Enterprise AI (an inspiration from the original 12 Factor App article about cloud computing) that I posted on my Github repo. If you have comments and/or suggestions for modifications please feel free to respond here or do a pull request on the repo.
0. Introduction
Since the advent of generative AI, there has been a surge of companies experimenting with and building applications with AI features, but the requirements for large enterprises are inherently different that entails stricter rules around accuracy, explainability, etc., with speed and scale.
The Seven-Factor App, inspired by the Twelve-Factor App document for cloud computing, is a methodology to build enterprise grade generative AI applications.
The Seven-Factor App is agnostic to the underlying choices of technologies and programming languages and is meant to be treated as a guideline for best practices but can ultimately be configured based on individual requirements for different enterprises.
Background
The document is based on the experiences of directly working with and helping several small and large companies building generative AI applications using SingleStore as the underlying data source. These companies include but are not limited to multi-billion dollar tech and financial services companies. However, this repo is open for collaboration and contribution from other developers, engineers and architects who have built reality-scale enterprise-grade generative AI applications and have other notes to add. We will capture those notes and case studies in a separate chapter called case studies after step VII.
I. Modularity Over Monoliths
The Seven-Factor App is inherently modular and follows the best practices of microservices architecture. This enables iterative changes, deployment and scalability at scale. Modularity can be achieved by either packaging coherent set of functionalities as microservices or model an emerging architecture specifically for generative AI applications called Multi-Agent RAG Systems (MARS).
MARS enables building AI Agents that are capable of both Reasoning and Action (ReACT agents) and built with collaboration at scale with other agents.
A typical ReACT agent consists of three primary constructs:
- Intelligence — Access to one or more LLMs.
2. Tools — Access to receptors and effectors through webhooks and APIs.
3. Knowledge — Access to both structured and unstructured data specific to the agent’s goals and objectives.
An agent also maintains state or has both long-term and short-term “memory” that can be encapsulated within knowledge.
Collaboration
As of writing this document, there are two primary ways of collaboration between agents.
Supervisor and Worker collaboration — In this method, the supervisor assigns tasks to other agents, collates and then either returns the information and/or completes an action. In this method, the results are non-deterministic and not always explainable which is why the second method is more appropriate for enterprise use cases.
Sequential workflows — In this method, a workflow or graph consists of nodes and edges. The nodes could be agents that are called based on sequential algorithms and executed based on the graph/workflow overall requirement. In this method, there is more control but requires more effort to build and continue to maintain and iterate based on changing requirements.
II. Information and Context Curation
A typical enterprise consists of multiple databases, data sources and ETL pipelines. However, in a gen AI application if the LLM does not have access to the most current information, it runs the risk of returning information that is inaccurate or running an action that is no longer needed, for example creating a customer support ticket when an issue was resolved based on a real-time conversation with a customer support representative.
A Seven-Factor App is designed to provide the most relevant and accurate information to LLMs and Agents in a few milliseconds in one round-trip query by searching across ALL data in an enterprise. Due to this requirement, the Seven-Factor App recommends one single data assimilation layer that can be queried with both semantic and exact keyword matches and the data is returned in a structured format. This data layer should have access to the enterprise’s structured (for example, Iceberg data) and unstructured data (for example, binary files and pdfs) across both transactional and data warehouses, data lakes, data lake houses data.
The information layer thus should have the following components accessible for Retrieval Augmented Generation (RAG) retrievers and agents:
- Real-time Change Data Capture (CDC) in and out from different data sources.
- Ability to store vector data for unstructured data along with relational, hierarchical and structured data.
- Ability to define optimized structured queries that can be called as functions from agents and APIs, tools.
- Ability to run hybrid search including vector functions along with keyword search and analytics (aggregate functions) across all data ideally in single round-trip hops.
- The information layer should have the capability to match existing corpus of enterprise data with fast emerging streaming real-time data to provide effective contextualization for LLMs.
- The information layer should be able to honor data access, governance and policy and audit requirements of the enterprise.
- For certain use cases, the information layer should be able to provide a catalog of all data available.
- The information layer should provide a way for developers and data engineers to discover data through published meta data, create prototypes of applications (for example Jupyter notebooks) and the ability to deploy data apps as microservices that are resilient to underlying schema changes (for example, through the use of materialized views).
- The information layer should provide branching as a feature to enable viewing and iterating over different versions of data states.
- The information layer should also allow persisting feedback of query responses from different agents and apps that can then be enriched with Reinforced Learning from Human Feedback (RLHF) and then continuously fine tune certain LLMs.
III. Many LLMs, One Intelligence
Due to the first principle of modularity of the Seven-Factor App, an enterprise generative AI application should not rely on one single provider, model or version of a Large Language Model (LLM). There are several reasons for this but the three most common ones include
- The evolution for fast LLM model iterations.
- Pricing.
- Requirements of certain agents and information to reside locally in a VPC or in a geo.
- The requirement to fine-tune smaller models for highly specific tasks.
The ensemble of LLMs can be spread across multiple agents or similar to the information layer, be discoverable, fine tuned and then deployed using technologies similar to AWS’s Bedrock, Azure AI Studio, Nvidia’s Inference Microservices (NIMs) platform or Google’s Vertex AI.
A Seven-Factor App may include either a specialized agent, graph or an LLM router to dynamically route queries and requests to other agents or LLMs.
IV. Dynamic Tools, Upgradeable Skills
A tool is typically an API endpoint that can be invoked remotely (for example REST) or through a function that has been implemented for a specific task or action, for example, vectorizing a given input stream or scraping a web site or retrieving a specific piece of information across a specific data set. All tools must follow the enterprise’s security standards for authorization and authentication with audit and rate limiting capabilities.
A function used as a tool should ideally return information using the OpenAPI schema to ensure standardization of communication between and through different agents. Each tool should be independently upgradeable and deployable.
For large enterprises, the tools should be discoverable through a catalog and developers should be able to create new or encapsulate functionalities within the Tools OpenAPI schema and publish to the catalog, similar to docker containers.
A collection of specific tools, along with specific prompt and fine tuned domain-specific knowledge (data) can be encapsulated as a skill for an agent. For example, an agent that has access to keyword research API, web scraping APIs, web analytics APIs and access to fine tuned LLM with specific prompt could be referred to an Agent with the enterprise SEO skill.
V. Collaboration and Orchestration at Scale
In a Seven-Factor App, there are primarily two big categories of collaboration with AI agents.
- Role-based collaboration — In this method, agents have different roles. A task is decomposed into smaller tasks and assigned based on roles. The output is then verified, critiqued and assimilated by other role based agents and then returned as an output. Role-based collaboration works well when dealing with asynchronous goals and objectives and does not have real-time response requirements.
- Workflow-based orchestration — In this method, the sequence of tasks is defined as a workflow or graph. The workflow consists of nodes and edges. A node could be a task or represent an agent with skills and knowledge to do a specific task. The workflow also has conditional branches and loops to ensure the sequence of tasks happen in a certain order based on enterprise rules — for example, create embeddings only after the raw data has been summarized and categorized. A workflow based orchestration can also invoke certain tasks in parallel for faster response times.
One of the key requirements of agent collaboration is the ability to observe and record the communication and actions between agents that can then be enriched with human feedback for further long-term optimization and also for audit requirements.
VI. RAG Stack for Speed and Accuracy
A Seven-Factor App requires Retrieval Augment Generation (RAG) tech stack that fulfills the following requirements:
Safety, Security and Privacy — Any query or request to an agent or a workflow should first be handled by guardrails that records, makes a design and acts on whether that task or information should proceed to the next step. This should happen both at input and output. These safety, security and data privacy rules should be adhered to based on the enterprise’s codified rules for example, masking certain kinds of data, guardrails to prevent insecure or unauthorized data access etc.
Accuracy and Relevancy — Input queries should be enriched with context by asking questions, seeking clarification and approval when appropriate and needed (for example, before running a command). For accuracy, the app must take advantage of re-ranking, evaluations, and fine-tuning with RLHF as methods at output validation. In addition, care should be taken to use fine-tuned embedding models and matching LLMs to ensure the information is vectorized (when needed) and retrieved for highly contextualized information. Guardrails should also be enforced to reject retrieval queries not related to enterprise corpus of data.
Speed and Scalability — A RAG stack should be capable of querying and retrieving data across multiple data types, multiple data stores (Lakehouses and transactional DBs), use both vector and keyword match search across of petabytes of data and with latency of less than a second as listed under the second point of the Seven-Factor App. Semantic caching, use of local GPU based inference microservices, and memory-first architectures should be used to achieve sub-second latency.
VII. User Experience for Agents
A Seven-Factor Enterprise AI App uses a disaggregated UI layer that falls under one or several of the User Experience (UX) paradigms of interacting with Multi-Agent Systems.
Information Retrieval Use Cases — The most common user experience in this scenario is a conversational user interface with the following mechanisms available for input — a structured file like csv, an audio file or an image or pdf or a video. The input should ALWAYS be explicitly provided by the user and never be automatically added (for example audio or vision without the user’s explicit knowledge and approval). The output typically consists of streaming text and, when appropriate, rich information in the form of analytical charts and widgets with actionable buttons and the ability to persist them within a web page. The rich output may also include audio and video files that can be persisted by the users.
Agentic Use Cases — When an action needs to be performed, a widget or a command line explicitly requiring users to approve the action should be used. In order to receive approval, the agent or the system should explicitly describe what will happen as part of the action. Each of these approvals should then be recorded for audit requirements.
Build Use Cases — In an enterprise, developers from different teams are required to build apps, services and automation across multiple data sets and by utilizing services built by other teams. In these scenarios, developers should have a way to discover tools, services and access to data catalog. When appropriate, they should also be able to use templatized notebooks or microservices to build applications, services and workflow and debug and deploy independently using company standards.
✌️
Enjoyed This Content?
Thanks for getting to the end of this article. My name is Madhukar, I work at the intersection of technology (and AI) and creativity . I love to build apps and write about enterprise AI, PLG, and Marketing tech.
I am also building a course. Reach out to me on LinkedIn if you are interested to collaborate in any way.
Subscribe for free to get notified when I publish a new story.