A Repeatable Framework for Generative AI at Scale
Generative AI is not a passing fad, it is here to stay. Organizations cannot ignore it. At the same time, they cannot afford to take on the technical debt and infrastructure costs associated with 10 in-house generative AI use cases built in 10 different ways. A framework is needed. A framework that is repeatable, and also flexible, and extensible. A framework that is technology-agnostic and that scales to the needs of the largest enterprises.
For over a decade, we at DataRobot, have been developing our product to be all those things — technology-agnostic, scalable, extensible, flexible, and repeatable. Today I’m pleased to introduce the framework we have developed for delivering enterprise-grade, end-to-end generative AI solutions at scale.
Its defining characteristics are:
- Securely use proprietary documents and data to provide context to LLMs by converting that information into custom knowledge bases.
- Insert one or more Predictive AI models to audit the performance of the Generative AI model and/or to enforce guardrails.
- Select from best-of-breed components during development.
- Provide end-users with multiple UI options for model consumption.
- Monitor the end-to-end solution with purpose-built observability metrics.
RFPBot Example
This framework can best be understood in the context of an example. This article and the embedded video showcase a Request for Proposal Assistant named RFPBot. RFPBot has a predictive and a generative component and was built entirely within DataRobot in the course of a single afternoon.
In the image below, notice the content that follows the paragraph of generated text. There are 4 links to references, 5 sub-scores from the Audit Model, and an instruction to up-vote or down-vote the response.
RFPBot uses an organization’s internal data to help salespeople generate RFP responses in a fraction of the usual time. The speed increase is attributable to three sources:
1) The custom knowledge base underpinning the solution which stands in for the experts that would otherwise be tapped to answer the RFP.
2) The use of Generative AI to write the prose.
3) Integration with the organization’s preferred consumption environment (Slack, in this case).
RFPBot integrates best-of-breed components during development. Post-development the entire solution is monitored in real-time. RFPBot both showcases the framework itself and the power of combining generative and predictive AI more generally to deliver business results.
As you learn more about RFPBot, recognize that the concepts and processes are transferable to any other use case that requires accurate and complete written answers to detailed questions.
Applying the Framework
Within each major framework component, there are many choices of tools and technology. When implementing the framework, any choice is possible at each stage.
We know that organizations want to use best-of-breed, and we also know that which specific technology is best-of-breed will change over time. Therefore what really matters is flexibility and interoperability in a rapidly changing tech landscape. The icons shown are among the current possibilities.
In the case of RFPBot, it uses Word, Excel, and Markdown files as the sources, an embedding model from Hugging Face (all-MiniLM-L6-v2), the Facebook AI Similarity Search (FAISS) Vector Database, ChatGPT 3.5 Turbo, a Logistic Regression, a Streamlit Application, and a Slack integration. Each choice at each stage in the framework is independent. The role of the DataRobot AI Platform is to orchestrate, govern, and monitor the whole solution.
Let’s look under the hood to see how generative and predictive models work together, as users are actually interacting with two models each time they type a question — A Query Response Model and an Audit Model. The Query Response Model is generative — It creates the answer to the query. The Audit Model is predictive — it evaluates the correctness of the answer given as a predicted probability.
The citations listed as Resources in the RFPBot example are citations of internal documents drawn from the Knowledge Base. The Knowledge Base is created by applying an Embedding Model to a set of documents and files and storing the result in a Vector Database. This step solves the problem of LLMs being stuck in time and lacking the context from private data.
When a user queries RFPBot, context-specific information drawn from the Knowledge Base is made available to the LLM and shown to the user as a source for the generation.
Orchestration and Monitoring
The whole end-to-end solution integrating best-of-breed components is built in a DataRobot-hosted notebook, which has enterprise security, sharing, and version control.
Once built, the solution is monitored using standard and custom-defined metrics in DataRobot’s AI Production interface. In the image below notice the metrics specific to LLMOps such as Informative Response, Truthful Response, Prompt Toxicity Score, and LLM Cost.
By abstracting away infrastructure and environment management tasks, a single person can create an application such as RFPBot in hours or days, not weeks or months. By using an open, extensible platform for developing Generative AI Applications and following a repeatable framework organizations avoid vendor lock-in and the accumulation of technical debt. They also vastly simplify model lifecycle management by being able to upgrade and replace individual components within the framework over time.
RFPBot Framework Executive Overview Video
Learn more about the Framework for Generative AI Applications at Scale and see RFPBot live in this 5-minute video: