Generative AI Reference Architecture

Manav Gupta
4 min readDec 15, 2023

--

Special thanks to Mihai Criveti, Chris Kirby, David Stacy, Wissam Dib, Janki Vora.

We recently published IBM architectures for Generative AI, after collecting best practices, patterns, lessons learnt, etc.

Architecture Patterns

  1. Retrieval Augmented Generation
  2. AI Governance
  3. Modular Reasoning, Knowledge, and Learning (MRKL)
  4. Document Summarization
  5. Code Generation with Ansible and Z
  6. IBM AI platform

Reproduced below is the component model for generative AI.

Generative AI Component Model

GenAI Unique Capabilities:

  • GenAI Operations are the capabilities required to manage, deploy, and customize generative AI models for use within an enterprise. Included in this category are capabilities for training and tuning models, managing the lifecycle of models once deployed, and for managing models and datasets available to users within the enterprise.
  • GenAI Application Development are the capabilities necessary to tune general foundation models for use in enterprise- and domain-specific solutions, and to develop full-feature generative AI applications. Future revisions of this capability group may expand to include capabilities for designing and training generative AI models from scratch but the costs and time involved in creating new models make these capabilities unnecessary for most enterprises.
  • GenAI Governance is a suite of capabilities required to effectively monitor and manage models deployed into production. These include capabilities to monitor models’ continuing accurate and appropriate responses, capabilities to safeguard models from inappropriate and/or malicious inputs, and governance capabilities to manage enterprise risks and assist with with both regulatory compliance and reporting requirements.

The remaining capability groups are supporting capabilities for generative AI. The capabilities are not unique to generative AI but must be present to support it as an enterprise capability. These groups are:

  • Data Management is a group of capabilities to store, manage, and transform data to forms that make it suitable for tuning and training of generative AI models. Also included in this category are capabilities to log and rate model responses for auditing purposes, and as input to further model tuning and refinement.
  • Supporting Capabilities is a catch-all grouping of application, integration, and IT operations capabilities required to successfully deploy and manage generative AI solutions with an enterprise.
  • GenAI Resources captures the hardware and platform capabilities necessary to efficiently and effectively develop, tune, deploy, and manage generative AI models and solutions.

Capabilities

Each capability category is made up of one or more capability groups. This section highlights groups and capabilities key to generative AI.

Model Hub capability group encapsulates the capabilities necessary to manage imported models as well as models tuned or trained by the enterprise. These capabilities enable enterprises to manage the models and data sets available for use within the enterprise, and to limit access to models and data sets to specific users or groups within the enterprise. Model importing and Data importing are key capabilities for enterprises to gate the intake of models from the growing number of public model repositories such as Hugging Face.

Model Hosting Model Hosting offers capabilities for deploying general and tuned models as API-enabled services within an enterprise, optimizing resource utilization, allowing independent refinement and replacement, and simplifying governance. Key to this is Model Access Policy Management, ensuring model access is restricted to authorized users and groups, preventing unauthorized usage.

Model Customization is a group of capabilities that enable an enterprise to tune and train generative AI models for specific business needs.

Model Governance is a critical set of capabilities for an enterprise to make use of generative AI models on a wide scale. Specifically, these capabilities provide enterprises with the insights they need to monitor and manage model risks such as the introduction of bias in model responses, and to help address regulatory and compliance requirements for model transparency and fairness.

Model Monitoring is the operational analogue to Model Governance; where Model Governance deals with long-term model and risk management, the capabilities in Model Monitoring enable enterprises to monitor and management model operations in real time. Model Monitoring is comprised of several key capabilities, including:

  • Bias Detection the ability to detect and flag when a model’s responses deviate from established / ideal responses and begin to favor a set of outcomes over another.
  • Hate, Abuse and Profanity (HAP) Detection is the ability to detect and filter hate, abuse, and profanity in both prompts submitted by users and in responses generated by the model. These are considered ‘base’ capabilities; enterprises will often choose to extend the list of filtered topics to include topics not appropriate to the business, eg. sexually suggestive topics in a lending office, or to accommodate the social norms of a target audience.
  • Prompt Monitoring and Security is an emerging capability required to protect deployed models against attacks, such as prompt injection, designed to corrupt the model or to circumvent model controls established by the enterprise.

GenAI Tuning is a group of capabilities necessary to ‘customize’ a general generative model to the needs of the enterprise. Models are trained on a broad base of knowledge and will lack knowledge of specific industry jargon and processes. Thus most enterprises will need to make use of capabilities like Prompt Engineering, Prompt Tuning, and Model Fine-tuning to create a model that understands the terms and processes of the the enterprise’s business.

GenAI Application Capabilities enable enterprises to develop advanced generative AI applications by incorporating Orchestration for managing multiple AI model interactions, and Intent Detection to understand user requests and translate them into actions within the system, such as account balance queries.

--

--