Generative AI on Red Hat OpenShift 4
Authors: David Kypuros, Bobby Johns and @Jason Nagin
So where do I begin?…
Well, it all started with a little fun in the lab :-)
Let’s start with my friends here in the Plano Texas office. Although, we technically work in different groups within Red Hat we share our common love for emerging technology and yes; Red Hat OpenShift :-)
Background
Over the past 8 months, I’ve been working with Generative AI, primarily as a tool or a way to quickly learn other Telco-related technologies to compete in the TM Forum DTW 2023. Along the way, I discovered that GenAI is a cool stand-alone technology. After many lively lunch discussions in the office and countless “what if we ___” type conversations, the three of us found ourselves converging on an OpenShift 4 Gen AI lab. We decided to move my local Gen AI development environment (and Bobby’s) to our joint Red Hat OpenShift lab for some testing. After a while, we finally did it! I’ve included a few sample demos from the team to give you a sense of what a Gen AI Lab running on Red Hat OpenShift 4 can do.
Looking back it’s been a fun 8-month journey, and I hope to expand this blog into a “getting started with Gen AI on Red Hat OpenShift” as set of instructions. It’s one thing to “get it working and stable” in a single lab; it’s another to generalize the process so it can be easily understood and repeated :-)
Demo: “Talk to an OpenShift Expert”
This is a demonstration of a Generative AI application running on Red Hat OpenShift 4.12, captured via a screen recorder on my iPhone. It has ingested all of the Red Hat OpenShift 4.13 documentation, which totals 3.9 million words across 30,700 chunks, each consisting of 1,000 tokens. This fun application essentially allows you to “Ask any Question about Red Hat OpenShift 4.13.” The ingestion process can be readily adapted to encompass any personal or company-specific documentation or internal “Methods and Procedures.” The bot primarily aggregates local data YOU PROVIDE, enabling you to interact with the data using Natural Language Processing from an API provider of your choice. The data showcased in the video is predominantly sourced from local storage on Red Hat OpenShift 4. (Demo Idea: Perhaps next, I could ingest MOP data?)
Demo: Gen AI Integrated with Red Hat OpenShift 4
This demo covers the deployment of a Generative AI application running on Red Hat OpenShift 4.12. We’re leveraging GitHub and ArgoCD to establish a CI/CD pipeline. For integration purposes, Python is used to leverage Langchain and ChromaDB as client libraries. The formation of the Red Hat documentation URLs is done through a Bash script. The container build process utilizes an external GitHub Runner, instantiated as a SystemD service on a Fedora bare metal system that’s located here in Plano. The built container is subsequently hosted on GitHub as a private registry, and is later used by Red Hat OpenShift as directed by ArgoCD. ArgoCD pulls the Helm chart from GitHub and tells OpenShift 4 which resources to create.
Gen AI on Red Hat OpenShift 4
So why Red Hat OpenShift? Well, let me share from my personal Gen AI experience so far. It’s really fun building applications and exploring what you can do with Generative AI as a tool. Just last weekend, I lost an entire day testing Smol AI, building six complete applications using a long list of requirements (ERDs). Interestingly, these were also generated from AI — Bard, ChatGPT-4, and Claude. It’s exhilarating to watch AI construct a complete app from scratch. However, what I’ve noticed is that I frequently want to place my application somewhere other than my laptop! My laptop has become a testing ground (aka messy) for tons of projects. Sometimes I just want to actually use my Chat Bot and place it somewhere that I can access it anytime I need.
Many of the Gen AI tools and databases already provide a container option that I’ve leveraged locally on my laptop, and it makes sense to leverage containers and just move them somewhere that can handle them natively. A single bare-metal instance of Red Hat OpenShift known as SNO (Single Node OpenShift) is a great place to deploy my application. The recipe that’s worked for me is SNO plus a good VPN. For an actual organization, this would obviously be a different scenario and a different set of options. The public cloud is a better option for quick timelines. Red Hat provides services on AWS known as ROSA, on Azure called ARO, and offers Red Hat OpenShift Dedicated on GCP.
Red Hat OpenShift as a Platform for Gen AI Apps: Red Hat OpenShift stands as a key deployment environment for applications, with a special emphasis on hosting generative AI applications seamlessly.
The Gen AI Technology Landscape
So, as you can see, we can deploy Gen AI applications on Red Hat OpenShift with a wide variety of Gen AI tools and CI/CD pipeline options. We should probably zoom out a bit and take a look at the Gen AI Technology Landscape.
The core of the Gen AI framework is primarily the Large Language Model, wrapped with APIs and some kind of vector database. This can be either locally deployed or consumed as an online service.
Generative AI framework: A core solution intertwining large language models wrapped with API services and vector databases. These offerings cater to various deployment environments, including cloud-based services, on-premise, or a hybrid of both. These local services include a wide variety of open-source software and related tooling.
The core Gen AI framework interacts with multiple Gen AI tools like Langchain, Langflow, and Flowise. There are also some tools related to Vector databases such as ChromaDB client libraries, or Pincone as a managed service. The point here is the tooling is often an interface to the “core Gen AI framework.”
Generative AI Tools: Advanced tools like Langchain and Langflow spotlight the next evolution in generative AI, greatly enhancing data integration capabilities.
When we look at technology beyond the core Gen AI framework, everything really just becomes an API that interacts with this core framework.
One of the easiest ways to start interacting with this core framework and leverage the tools is to use an “AI Workbench.” These can be spun up on your laptop or consumed as a cloud service like Red Hat OpenShift for Data Science. We’ll explore this more later.
The reason Gen AI is becoming such a hot topic is that these tools are now at a point where they are powerful enough to really use the data. We have had copious amounts of data for decades and have wrestled with making sense of it from the beginning of computer science. The data is the foundation for anything in AI/ML, including Gen AI.
Red Hat Knows Data Integration: Red Hat’s rich history, featuring tools like Fuse, Decision Manager, PAM, AMQ, Kafka, and Camel K, positions the company at the forefront of data integration, setting the stage for generative AI’s integration.
Because of the fluid nature of data, innovation, and creativity, having a platform that allows for quick prototypes as well as easy scalability is necessary. Having a lab to experiment with was critical to exploring, innovating, and building Gen AI solutions.
Integrating with and Leveraging Red Hat OpenShift: Red Hat Openshift for Data Science promotes a robust data science workbench, leveraging tools like Nvidia operators, Jupyter notebook operator, and IBM Watson Studio operator. Integrated with libraries such as Boto3, it ensures seamless connections to cloud services like Amazon S3.
Data comes in a virtual cornucopia of useful information, if you can wrangle and make sense of it. Some of this data could be straight Text Data, Image Data, Audio Data, Video Data, Tabular Data, Sensor/IoT Data, Genomic Data, Financial Data, Social Media Data, Healthcare Data, Environmental Data, even Gaming Data. All of these could be used in a Gen AI Framework.
Data Domains (3): Data in motion, generative data integration, and data pipelining — each addressing distinct aspects of data processing — are facilitated using tools like Apache Kafka and Camel K, which are offered as Red Hat OpenShift runtimes.
The Gen AI Framework could be deployed almost anywhere. A data center, near edge like the edge of the Telco network, or say a cell tower. Running a commercial large-scale LLM like OpenAI, or local open-source LLM like HuggingFace, and vector db for deployments in areas without stable network connections (mobile).
Gen AI at the Edge: The vast data sourced range, from remote edge environments like RHEL for Edge running AWS IoT Greengrass to large-scale cloud environments like ROSA and ARO, showcases Red Hat’s extensive product portfolio adaptability.
Interesting Discovery a long the way — GenAI Ops
During our work on this Gen AI framework, GitOps, in general, started to seem different. It seems like you can use GitOps as a way to deploy Gen AI, or you can use Gen AI as a way to introspect your Ops, or in other words — GenAI — Ops.
OpenAI API now supports LLM based function calls. These are special calls in that they are “cold,” or rather they don’t actually perform the processing. The JSON object is passed to something that can run the function and later interact with the Chatbot or any Gen AI application you have. In the context of our lab, this would equate to a JSON object.
(Demo Idea: Perhaps next, I could build a Streamlit application that interacts with ArgoCD and Red Hat Advanced Cluster Manager in the background, and the user can essentially “Talk to their OpenShift Cluster via Function Calls”?)In essence the use of Gen AI to explore cluster health seems like an exciting new domain to explore
Gen AI Application Development Environment
Leveraging ArgoCD Deployments for OpenShift Gen AI GitOps: Red Hat Advanced Cluster Management (ACM) champions the GitOps deployment methodology for generative AI apps. An enriching experience was noted when deploying and testing using a GitHub runner, emphasizing specific CPU builds for seamless integration with Red Hat OpenShift.
So what does it mean to work with Gen AI applications in a development environment? As for our local dev workbench, we had a working environment with combinations of Fedora desktops & laptops, MacBook Pros with M1s, VMware VMs on ESX, Windows AD, and cloud services running with local and remote instances of PyCharm, Jupyter Notebooks, ChromaDB, Weaviate, and Pinecone. These were connected with several virtual networks both local and remote, using WireGuard, CloudFlare, and iptables. With hardware, we were able to beg, borrow, and pilfer.
Local Development Environment: Podman offers a solution to set up generative AI tools and testbeds locally, allowing developers a sandboxed environment for rigorous testing.
Langchain is to Gen AI as Camel is to data integration
An important development environment usually includes Langchain. Langchain is really the glue that pulls all of these disparate elements together. It really is the Camel of Gen AI. As it has expanded, our understanding and use of Gen AI have expanded.
We built and experimented a great deal, but the best tool that came along was LangFlow. It’s an open-source project, but we expect them to commercialize the product and build a startup out of it if they have not already taken formal steps to do that yet.
We built and experimented a great deal, but the best tool that came along was LangFlow. It’s an open source project, but we expect them to commercialize the product and build a startup out of it, if they have not already taken formal steps to do that yet.
Podman is an excellent tool for building container images and running them locally. You can run Podman on most flavors of Linux as well as Windows and macOS.
Before we delve further into the details, at a high level, Red Hat OpenShift is a platform that hosts Gen AI applications. It features a Core Gen AI Framework (LLM, Vector DB), and tools like LangFlow and workbenches to help you interact with the core framework. Lastly, it’s important to have a perspective on data integration in general.
Now that we’ve touched on the topics at a higher level, let’s explore some of these areas in more detail.
Digging A little deeper
Overall, OpenShift provides a powerful and flexible platform that enables seamless development, deployment, and scaling of generative AI applications while addressing critical concerns like containerization, scalability, security, and efficient management.
Generative AI Tools
FlowWise and LangFlow are two similar yet distinct tools that have captured the attention of many users recently. Both tools offer intuitive interfaces and robust functionality for language processing tasks.
LangFlow
A Web-Based GUI Workbench for Gen AI. LangFlow is built on the React Flow framework and offers a GUI for designing and implementing language chains.
Flowise
Flowise provides a comprehensive platform for creating flows, chatting, and generating APIs that can be integrated into various applications.
Raw Python Application Development
Many Gen AI applications start out as just raw Python in Jupyter Labs Notebook, or a Python script. Much of the initial development of this effort was done in just that fashion, using a series of Jupyter Notebooks as Python scripts.
Leveraging Red Hat OpenShift Data Science
A Workbench
Red Hat OpenShift Data Science (RHODS)
Red Hat OpenShift Data Science (RHODS), hosted on OpenShift, streamlines end-to-end data science and machine learning workflows by integrating tools like JupyterHub, Apache Spark, TensorFlow, and PyTorch. With Airflow for orchestration, it facilitates both small-scale analysis and expansive machine learning projects. This unified platform promotes collaboration, experimentation, and rapid model deployment. Rooted in the Open Data Hub project, its modular architecture enables adjustments to meet organizational needs.
IBM Studio — Example partner integration with Red Hat OpenShift for Data Science and Public Cloud. Features include publishing results and interacting with other cloud services.
Red Hat’s Long History with Data Integration
Red Hat has developed many data integration products over the years, each catering to specific integration needs. Some of the most useful tools have been available from Red Hat for data integration, both past and present.
Development & Integration Components
CamelK
Apache Camel, and its KNative flavor, CamelK, enable seamless connectivity between various data sources, services, and systems. This is essential in Generative AI, where data comes from diverse sources, and integration is crucial for preprocessing and post-processing data.
Node.js
Node.js offers several features and functions that make it suitable for Gen AI (Generative AI) development such as being scalable, and it has a very large community behind it.
Python
The combination of Python’s simplicity, powerful libraries, strong community support, and its ability to handle a wide range of AI tasks has positioned it as the go-to language for AI development, contributing to its exceptional popularity in the AI tools community.
KNative
Knative offers several powerful features that make it highly beneficial for developing and deploying AI applications. Knative’s autoscaling, event-driven model, serverless capabilities, deployment pipelines, and other features make it a powerful choice for deploying scalable, responsive, and efficient Generative AI applications, freeing developers to focus on AI innovation rather than managing infrastructure complexities.
Podman
Podman can be a valuable tool for Gen AI application development, providing isolation, portability, dependency management, integration with other Gen AI tools, version control, scalability, and security benefits. By using containers, you can streamline the development and deployment of Gen AI applications, ensuring a consistent and efficient workflow while leveraging the power of Python and other Gen AI libraries.
Data Flow Support
AMQ (Active MQ)
Red Hat’s AMQ (Advanced Message Queuing) offers several features and functions that make it highly useful for data integration and Gen AI (Generative AI) applications.
AMQ Streams (Kafka)
Red Hat’s AMQ Streams, based on Apache Kafka, offers several features and functions that make it highly useful for data integration and Gen AI (Generative AI) applications.
Red Hat Service Interconnect (Scupper)
Red Hat Service Interconnect (Scupper) is a service integration tool that enables secure communication between Kubernetes clusters. While its primary focus is on service communication across different Kubernetes clusters, it can still be useful for data integration and Gen AI (Generative AI) applications.
Generative AI Core Framework
Large Language Models (LLMs)
The purpose of a large language model in Generative AI is to understand and generate human-like text based on the patterns it has learned from massive amounts of training data. A large language model, like GPT-3 (which stands for “Generative Pre-trained Transformer 3”), is designed to process and generate text in a way that simulates human language understanding and expression. It serves as a powerful tool for various natural language processing tasks, creative content generation, and even assisting users in generating coherent and contextually relevant text.
Here are some key purposes of a large language model in Generative AI including text generation, language translation, text summarization, question answering, conversational agents, content enhancement, coding assistance, language understanding, research and analysis, innovation and creativity.
In essence, a large language model in Generative AI serves as a versatile tool for generating human-like text, facilitating communication between humans and machines, and assisting with a wide range of tasks that involve understanding, processing, and producing natural language.
Vector Databases
The purpose of Vector Databases is to efficiently store, index, and retrieve high-dimensional vectors that represent data points, such as embeddings of text, images, audio, or other complex features. These databases are crucial for various Gen AI tasks, including similarity search, content recommendation, natural language processing, and data exploration. By providing fast and accurate vector-based querying, vector databases enhance the capabilities of Gen AI applications.
Benefits of vector databases in Generative AI:
- Feature Representation
- Semantic Relationships
- Similarity Measurement
- Data Transformation
- Machine Learning Models
- Dimensionality Reduction
- Generation and Interpolation
- Clustering and Classification
The most common vector databases include:
- Weaviate
- ChromaDB
- Pinecone
- Others include FAISS, Annoy, Milvus, Elasticsearch, Hnswlib and Postgres with an extention for vectors.
LangChain
LangChain is a framework designed to simplify the creation of applications using large language models (LLMs). As a language model integration framework, LangChain’s use-cases largely overlap with those of language models in general, including document analysis and summarization, chatbots, and code analysis.
LangChain is to GenAI as Camel is to data integration.
LangChain provides standard, extendable interfaces and external integrations for the following modules, listed from least to most complex:
- Model I/O — Interface with language models
- Data connection — Interface with application-specific data
- Chains — Construct sequences of calls
- Agents — Let chains choose which tools to use given high-level directives
- Memory — Persist application state between runs of a chain
- Callbacks — Log and stream intermediate steps of any chain
Chroma and Langchain visible as Langflow nodes on Red Hat OpenShift
Data Domains (3)
Data in motion, Generative Data Integration, and data pipelining — each addressing distinct aspects of data processing — are facilitated using tools like Apache Kafka and Camel K which are offered as Red Hat OpenShift runtimes.
Some of the more common data sources/formats are Text Data, Image Data, Audio Data, Video Data, Tabular Data, Sensor Data, Genomic Data, Financial Data, Social Media Data, Healthcare Data, Environmental Data and Gaming Data.
But grouped more broadly, there are really domains or use cases for the data.
Data in Motion
“Data in Motion” refers to the concept of handling and processing data as it actively moves from one location to another, typically in real-time or near real-time scenarios. In the context of Generative AI (Gen AI), “Data in Motion” involves managing and processing data while it’s being transferred or streamed.
This is crucial for real-time data-driven applications and dynamic AI systems. “Data in Motion” is a fundamental concept for Gen AI, enabling real-time processing, dynamic model adaptation, interactive responses, and the ability to generate content or insights based on continuously streaming data.
Generative Data Integration
“Generative Data Integration” is a concept that combines data integration techniques with generative AI approaches to improve the efficiency, quality, and adaptability of data integration processes. It involves leveraging generative models, often used in Generative AI, to enhance various aspects of data integration tasks. In traditional data integration, the goal is to combine data from different sources, clean and transform it, and make it usable for analytics, reporting, or other purposes.
Generative Data Integration takes this a step further by incorporating generative AI techniques to address some of the challenges and complexities in the data integration process. Generative Data Integration combines the strengths of traditional data integration techniques with the capabilities of generative AI models, leading to more flexible, adaptive, and efficient integration processes, especially when dealing with complex, diverse, or evolving data sources.
Data Pipeline
Data pipelining refers to the systematic and automated process of building and managing data workflows that enable the generation and utilization of data for AI tasks. A data pipeline in Gen AI involves a series of interconnected steps that transform, preprocess, and feed data into AI models, allowing for efficient and streamlined data-driven model training and inference.
Data pipelining is crucial for managing the flow of data from various sources to the AI models, ensuring that the models receive high-quality and relevant input for generating meaningful outputs. By establishing robust data pipelines, Gen AI practitioners can manage the complexities of data management, preprocessing, integration, and model training, leading to more effective and reliable Generative AI systems.
Red Hat Advanced Cluster Management (ACM)
Red Hat ACM champions the GitOps deployment methodology for generative AI apps. An enriching experience was noted when deploying and testing using GitHub runner, emphasizing specific CPU builds for seamless integration with Red Hat Openshift.
Leveraging ArgoCD Deployments for Gen AI on OpenShift
ArgoCD is a declarative, GitOps continuous delivery tool that fits well with Red Hat OpenShift. This tool makes it easy to deploy our Gen AI application. The ArgoCD process uses the output from the GitHub runner process below. See demo video above to get a sense of the flow and the role ArgoCD plays in the deployement.
GitHub Runner for Container Builds
In order to ensure that the container is built with the right optimizations for the CPU, we used a self-hosted GitHub runner. This runner ran on a local hosted system as a SystemD service to ensure the runner started with the machine in case of a reboot. The job within GitHub was based on a template available from GitHub. That template will clone the repository, run docker build, and then push it in to GitHub packages repository using a credential unique to the run. This allowed us to store the resulting container image in a repository that both our runner and Red Hat OpenShift could reach.
The job was also configured to only trigger when the Dockerfile was modified so we could also keep our Helm chart in the same repository. This consolidated the code into a single repository so we know everything was working together at the time of each commit.
Resources and References
Jason Webster has great code on ingesting data into ChromaDB. It is mainly focused on local file system versions of Chroma FYI.
Gary Stafford of AWS, has a great blog post and GitRepo . Although his project (Demonstration of Natural Language Query (NLQ) of an Amazon RDS for PostgreSQL database, using SageMaker JumpStart Foundation Models, LangChain, Streamlit, and Chroma) has a very specif focus he did a great job explaining integration between Streamlit and Langflow.
Flowise documentation site luckly offers some needed envioronment variables for handling OpenAI API Keys.
ChromaDB and LangFlow have delicate dependancies but here is a list of their versions.
Pinecone is very easy to get started with GenAI application building. Here is an easy place to start with Pinecone.
OpenAI API has very clear documentation. For budgetary purposes, I found GPT-3.5 Turbo 16K context very affordable.