Debugging RAG Pipeline

SuperKnowa offers a comprehensive debug pipeline to analyze performance and logging for a Retrieval-Augmented Generation (RAG) pipeline.

Published in

Towards Generative AI

32 min readAug 11, 2023

A debug pipeline provides a structured approach to identifying, isolating, and fixing issues within your code. It helps developers efficiently troubleshoot problems and ensures a systematic workflow, ultimately leading to a more robust and reliable software product. Understanding and monitoring the performance of each stage within the Retrieval-Augmented Generation (RAG) pipeline is a crucial element.

In this blog, we will discuss how we can implement a debug process for a RAG pipeline.

Adding Checkpoints for RAG Debugging

In the context of our discussion, we will manually extract information from our application, using a customized format that grants us control over the data structure. This allows us to deep dive into the data for enhanced visibility through subsequent analysis.

We use a straightforward approach: initiate an object as a bucket at the beginning of the request. This object is then passed throughout the process, gradually accumulating necessary information. Finally, this enriched object is stored in a storage.

The diagram presented here outlines the overall architecture and process flow, while data related to this architecture has been collected at various stages of the API calls.

Debug Logs Dataset

Throughout the entire process, we are collecting the following essential information:

Failed and successful request information for further analysis and debugging.
Model response and response time.
Prompt, context and model configuration.
User information and time of the request.
Retriever information, time taken for the request and the document it returning.
Reranker processing time.

Here is an example code, please read the comment in the code for better understanding.

@app.route('/api/v1/chat/multi/<retriever>/<no_model>', methods=['POST'])
def chat_multi_model(retriever, no_model):
    data = request.get_json()
    question = data['question']
    # Declared an object to collection information along the process
    info_collect = {}
    model_request = ModelRequestService(model_configurations)
    
    # Passing the information to the object to collection more information
    results = model_request.process_request_multi_model(question=question, retriever=retriever, info_collect=info_collect, number_of_model=no_model)
    
    # Adding information
    info_collect['request_time'] = datetime.now()
    info_collect["type"] = "multi_model"
    info_collect["retiever"] = Config.RETRIEVER

    # finaly added to mongo db for later analysis
    insert_id = request_history_collection.insert_one(info_collect)

    response = { 'results': results, 'ref': str(insert_id.inserted_id) }
    return json_util.dumps(response)

Subsequently, the object “info_collect” is transferred to the method to gather additional information. We’re gathering data regarding both successful and unsuccessful calls to the watsonx.ai platform. Additionally, we’ve implemented a fallback mechanism, similar to a circuit breaker pattern. This setup forms a request chain where, if one model fails, the request is directed to the next available model. In the event that all requests fail, a predetermined default response is provided to prevent user frustration.

info_collect['info'] = info
if(response.status_code == 200):
    # if the request is successful
    json_response = json.loads(response.content.decode("utf-8"))
    
    answer = json_response['results'][0]['generated_text']
    afterFormat = config.format_model_output(answer)
    result['prompt'] = prompt
    result["raw_answer"] = answer
    result["answer"] = afterFormat
    result["source"] = source_url
    result["question"] = question
    result["model_id"] = model['model_id']
    result['model_load_time'] =duration
    results.append(result)
else:
    # for failed request also adding a failed object to track down the issues
    fail_info = {}
    fail_info['prompt'] = prompt
    fail_info['model_name'] = model['model_id']
    fail_info['model_load_time'] =duration
    fail_info['error_msg'] = f"Request failed with status code: {response.status_code}, Reason: {str(response.reason)}"
    fail_info_list.append(fail_info)

info['failed_model'] = fail_info_list
info_collect["results"] = results

# Retriever information is adding here
elastic = ProcessElastic()
tic = time.perf_counter()
results_list = elastic.elastic_retervier(question)
toc = time.perf_counter()
duration = toc - tic
info['elastic_query_time'] = duration
info['elastic_result'] = results_list

# Rerankder information
tic = time.perf_counter()
context, source_url = config.reranking(question=question, results_list=results_list, max_reranked_documents=10)
info['source_url'] = source_url
toc = time.perf_counter()
duration = toc - tic
info['reranker_load_time'] = duration

Here’s a debug log we collected altogether, and we can use it to dig deeper and analyze it further.

{
  "info": {
    "elastic_query_time": 0.44522851100191474,
    "elastic_result": [
      {
        "document": {
          "rank": 0,
          "document_id": "",
          "text": "IBM Knowledge Accelerators, V1.0.0 documentationLast Updated: 2021-12-21\nWelcome to the Knowledge Accelerators documentation, where you can find information about\nhow to install,\nmaintain, and use the Knowledge Accelerators.\nGetting started\nProduct overview\nVideo: IBM Knowledge Accelerators - Introduction\nVideo: IBM Knowledge Accelerators - Structural Walkthrough\nDownloading and System Requirements\nGetting started\nCommon tasks\nCustomizing the IBM Knowledge Accelerators\nData Discovery with terms\nVideo: Customizing, searching  navigating, data classes  reference\ndata\nIBM Support Portal for IBM Knowledge Accelerators\nAccessing Industry Alignment Vocabularies for Healthcare\nAccessing IFRS Industry Alignment Vocabularies for Insurance\nAccessing IFRS Industry Alignment Vocabularies for Financial Services\nKnown issues \nMore learning resources\nIBM Knowledge Accelerators pages on ibm.com\nIBM Knowledge Accelerator for Healthcare Overview\nIBM Knowledge Accelerator for Insurance Overview\nIBM Knowledge Accelerator for Energy and Utilities Overview\nIBM Knowledge Accelerator for Financial Services Overview\nWatson Knowledge Catalog Knowledge Center\nNotices for Knowledge AcceleratorsLast Updated: 2021-12-21\nKnowledge Accelerators\nLicensed Materials - Property of IBM\nIBM Corporation\nNew Orchard Road, Armonk, NY 10504 \nProduced in the United States of America\nUS Government Users Restricted Rights - Use, duplication or disclosure restricted by GSA ADP\nSchedule Contract with IBM Corp.\nIBM, the IBM logo, ibm.com, IBM Watson, IBM Cloud Pak, and IBM Cloud are trademarks of\nInternational Business Machines Corp., registered in many jurisdictions worldwide. Other product and\nservice names might be trademarks of IBM or other companies. A current list of IBM trademarks is\nRed Hat, JBoss, OpenShift, Fedora, Hibernate, Ansible, CloudForms, RHCA, RHCE, RHCSA,\nCeph, and Gluster are trademarks or registered trademarks of Red Hat, Inc. or its subsidiaries in\nthe United States and other countries. \nThis document is current as of the initial date of publication and may be changed by IBM at any\ntime. Not all offerings are available in every country in which IBM operates.\nIt is the users responsibility to evaluate and verify the operation of any other products or\nprograms with IBM products and programs. THE INFORMATION IN THIS DOCUMENT IS PROVIDED \"AS IS\"\nWITHOUT ANY WARRANTY, EXPRESS OR IMPLIED, INCLUDING WITHOUT ANY WARRANTIES OF MERCHANTABILITY,\nFITNESS FOR A PARTICULAR PURPOSE AND ANY WARRANTY OR CONDITION OF NON-INFRINGEMENT. IBM products are\nwarranted according to the terms and conditions of the agreements under which they are provided.\nThe client is responsible for ensuring compliance with laws and regulations applicable to it. IBM\ndoes not provide legal advice or represent or warrant that its services or products will ensure that\nthe client is in compliance with any law or regulation.\nStatement of Good Security Practices: IT system security involves protecting systems and\ninformation through prevention, detection and response to improper access from within and outside\nyour enterprise. Improper access can result in information being altered, destroyed, misappropriated\nor misused or can result in damage to or misuse of your systems, including for use in attacks on\nothers. No IT system or product should be considered completely secure and no single product,\nservice or security measure can be completely effective in preventing improper use or access. IBM\nsystems, products and services are designed to be part of a lawful, comprehensive security approach,\nwhich will necessarily involve additional operational procedures, and may require other systems,\nproducts or services to be most effective. IBM DOES NOT WARRANT THAT ANY SYSTEMS, PRODUCTS OR\nSERVICES ARE IMMUNE FROM, OR WILL MAKE YOUR ENTERPRISE IMMUNE FROM, THE MALICIOUS OR ILLEGAL CONDUCT\nOF ANY PARTY.\nGeneral Data Protection Regulation European Union, 2017.\nInformation Commissioner's Office, \"Data protection ",
          "url": "",
          "source": "IBM Product Doc whole"
        }
      },
      {
        "document": {
          "rank": 1,
          "document_id": " An introduction to the DataOps discipline\n",
          "text": "DataOps is a collaborative data management discipline that focuses on end-to-end data management and the elimination of data silos. There are many DataOps definitions provided by the various thought leaders in this space, such as IBM, Gartner, Eckerson Group, Forbes, and DataKitchen, all of which essentially define it as \"the orchestration of people, processes, and technology to accelerate the quick delivery of high-quality data to data users.\" Built on software development frameworks such as Agile, DevOps, and Statistical Process Control, DataOps offers the following benefits:Decreases the cycle time in deploying analytical solutions\nLowers data defects\nReduces the time required to resolve data defects\nMinimizes data silosDataOps dimensions\nThere are three dimensions across which DataOps is executed: people, processes, and technology. It requires the organization of a team to promote collaboration and drive culture change, to identify and develop processes that transform existing data pipelines into DataOps pipelines, and absorb its value and identifying advantageous technical product features in a DataOps technology.The DataOps team\nDataOps supports a highly productive and tightly collaborative team that uses automation technology to help deliver efficiency gains. It comprises DataOps managers, such as data engineers, information architects, and DataOps engineers who are responsible for leading the delivery, management, and support of high-quality, mission-ready data at scale.Data engineers are responsible for data curation, data cleansing, and data availability.\nInformation architects are responsible for conceptualizing the data framework. They work with the relevant stakeholders within the enterprise to understand the business challenges and translate them into requirements and solutions.\nDataOps engineers are responsible for the frequent and timely releases of data pipelines and data products into production. Their responsibilities include end-to-end management and automation of the provision of environments, the data on the data platforms, deployment, testing, release, security, and monitoring processes.In addition to the DataOps managers, the DataOps team also consists of DataOps consumers, such as data scientists and data analysts, who ultimately turn the data into business value.Data scientists conduct research and iteratively perform experiments to create algorithms and machine learning models that address questions or solve problems.\nData analysts perform analytics to summarize and synthesize massive amounts of data in the data warehouses created by data engineers. They can also create visual representations of data to better communicate information that can be used to gain insights into the data.The DataOps process\nThe DataOps Flipbook by the IBM Worldwide Community of Information Architects  provides the following conceptual view of DataOps. DataOps enables you to engineer data by creating continuous end-to-end data flows using automated processes, quality control, and self-service tools. The right capabilities in technologies and tools can help you discover and deliver data in a matter of days or hours as opposed to weeks or months.The DataOps process comprises the following steps:Step 0. Pre-Collect  Planning for your project by gathering customer requirements, defining the business objectives, use cases, and KPIs.\nStep 1. Collect  Identifying your existing information architecture  and defining what your information architecture should be .\nStep 2. Organize  Organizing your data by performing data quality analysis, ensuring data lineage is maintained since the beginning and using data cleaning to fix problems with the data.\nStep 3. Analyze  Enriching your data using feature engineering, cataloging the useful features as well as models created using the data, and ensuring the models and data are versioned in order to easily track experiments and to make comparisons easier.\nStep 4. Infuse  Infusing the data model",
          "url": "https://developer.ibm.com/blogs//articles/an-introduction-to-the-dataops-discipline\n",
          "source": "IBM Developer"
        }
      },
      {
        "document": {
          "rank": 2,
          "document_id": "/Watson Studio overview",
          "text": "Watson StudioLast Updated: 2021-03-04\nWatson Studio provides you with the environment and tools to solve your business problems by collaboratively working with data. You can choose the tools you need to analyze and visualize data, to cleanse and shape data, or to create and train machine learning models.\nThis illustration shows how the architecture of Watson Studio is centered around the project. A project is where you organize your resources and work with data.\nThese are the kind of resources you can have in a project:\nCollaborators are the team who works with the data. Three roles provide different permissions within the project.\nData assets point to your data. Heres what you can do to prepare your data:\nAccess data from connections to your cloud or on-premises data sources\nAccess assets from your organizations catalog\nUpload files to the projects storage\nCleanse and shape data with the Data Refinery tool\nAnalytical assets and tools are how you derive insights from data. Some tools require additional services. Heres what you can do to analyze your data:\nAnalyze data in Jupyter notebooks with the notebook editor, JupyterLab, or RStudio.\nBuild, train or solve, and test machine learning and deep learning models.\nBuild SPSS modeler flows.\nBuild and solve Decision Optimization models.\nPromote models to deployment spaces to configure, monitor, and deploy them.\nRun deep learning model experiments in parallel with neural networks.\nAutomatically analyze your data and generate candidate model pipelines customized for your predictive modeling problem.\nCreate and share dashboards of data visualizations without coding.\nThis diagram illustrates how assets move between your catalog, projects, and deployment spaces. You can populate the catalog with assets directly, or you can publish assets from projects. You find assets in the catalog and then add them to any project. When you finish developing models in a project, you promote them to the deployment space thats associated with the project. You configure and deploy models in the deployment space.\nCatalogs\nA catalog is a repository of data and analytical assets for your organization. The catalog is provided by Watson Knowledge Catalog, which is included with Watson Studio Local. Use the catalog to share assets between projects. You can move assets into the catalog when you finish with your project, or start working on your project by moving in assets from the catalog. The catalog has a limit of 50 assets and 50 collaborators. To understand other limitations of the catalog, see the comparison of Watson Knowledge Catalog features by edition.\nWatson Studio and Watson Knowledge Catalog are fully integrated:\nYou can easily move assets between projects and catalogs.\nCatalogs and projects support the same types of data assets.\nYou can easily find the assets you need in a catalog. Heres what you can do:\nSearch with keywords and filters that are based on subject tags and other asset properties.\nLook the previews of asset contents to make sure you pick the correct assets.\nRead reviews about assets that are provided by catalog collaborators.\nChoose from recommended assets that are automatically compiled based on your usage history, similar assets, and other factors.\nChoose from the most highly rated assets.\nSee Catalogs.\nDeployment spaces\nA deployment space is where you configure and deploy your models. After you finish developing your models in your project, you create a deployment space for that project and move your models into it. Deployment spaces are provided by Watson Machine Learning.\nWatson Studio and Watson Machine Learning are fully integrated. You can easily move assets between projects and deployment spaces.\nSee Deployment spaces.\nLearn more\nFeature differences between editions\nKnown issues\nInstalling and setting up\n  // show table captions\n  var tabs = document.getElementsByTagName[0].getElementsByTagName;\n  for (var i=0; i",
          "url": "URL:https://www.ibm.com/docs/en/watson-studio-local/2.1.0?topic=welcome-watson-studio-overview",
          "source": "IBM Product Doc Inner"
        }
      },
      {
        "document": {
          "rank": 3,
          "document_id": "Db2 Data Gate blog series part 2 : IBM Db2 for zOS Data Gate —IBM Watson Knowledge Catalog integration",
          "text": "Db2 Data Gate blog series part 2: IBM Db2 for zOS Data Gate IBM Watson Knowledge Catalog integration - Sowmya Kameswaran \n Follow \n Cloud Pak for Data \n -- \n Listen \n Share \n By Vassil Dimov, Mateo Toi, Sowmya Kameswaran and Eirini Kalogeiton \n Introduction \n In the previous blog post we have introduced Db2 for z/OS Data Gate and described its main purpose to synchronize Db2 for z/OS data in the cloud. In this blog we will discuss how besides of the synchronization of the actual data Db2 Data Gate takes care of the corresponding metadata through the integration with Watson Knowledge Catalog. Additionally, we will highlight the business value of the integration. \n AI Ladder \n In the current business world, modernization of data and use of AI is the key to success. The guiding principles of the AI ladder defined by IBM, help organizations with business transformation based on the four key areas mentioned below: \n In this blog we will discuss how Db2 Data Gate and Watson Knowledge Catalog, representing the first two pillars of the AI Ladder, can help organizations to unlock the huge value of their Z data in the cloud. \n About Db2 Data Gate \n Db2 Data Gate enables modern high-volume, high-frequency hybrid cloud applications that need read-only access to valuable enterprise data from Db2 for z/OS. It plays a key role in the Collect pillar by enabling movement of data from Db2 for z/OS into the Cloud Pak for Data platform. With data synchronization between source Db2 for z/OS and target IBM Db2 and IBM Db2 Warehouse, applications are able to get access to current data. To learn more about IBM Db2 for z/OS Data gate, please read What is Db2 Data Gate? Db2 Data Gate Blog Series Part 1 \n About Watson Knowledge Catalog \n Watson Knowledge Catalog  is an enterprise data catalog management platform that forms the core of the Organize pillar of the Cloud Pak for Data platform. A catalog connects people to the data and knowledge that they need. It is the key enabler to building the enterprise data catalog on Cloud Pak for Data that enables platform users to find, prepare, understand, and use the data as needed. The data governance framework ensures that data access and data quality are compliant with your business rules and standards. \n WKC unites all information assets into a single metadata-rich catalog, based on Watsons understanding of relationships between assets and how theyre being used and socialized among users in existing projects. It is integrated with an enterprise data governance platform that merges the analytics capabilities of Watson Studio. The data catalog assists data scientists in easily finding, preparing, understanding and using the data as needed. \n Data protection has gained importance in recent years. That is why it is so important that WKC protects data from misuse and enables sharing of assets with automated, dynamic masking of sensitive data elements. This avoids violating various data protection regulations. For instance, when handling healthcare data in the USA, companies need to be aware of HIPAA , a set of rules on how personally identifiable information maintained by the healthcare and healthcare insurance industries should be protected from fraud and theft. Moreover, any company based in the EU or offering services to people in the EU must comply with GDPR , which has a much broader scope and governs the use of all personal data. \n Db2 Data Gate and Data Fabric \n Db2 for z/OS data is core to many operational applications but also foundational to business insight. Db2 for z/OS data is some of the most indispensable within an organization for a data fabric implementation. The end-to-end Db2 Data Gate solution makes data available and synchronized for end user access within a data fabric. As compared to alternative approaches, Db2 Data Gate can be simpler, less expensive, and timelier. With Db2 Data Gate organizations can go from transaction to business action in moments. \n Db2 Data Gate 2.1  Watson Knowledge C",
          "url": "https://medium.com/icp-for-data/db2-data-gate-blog-series-part-2-ibm-db2-for-zos-data-gate-ibm-watson-knowledge-catalog-55b208a5c0f1?source=collection_archive---------3-----------------------",
          "source": "Medium"
        }
      },
      {
        "document": {
          "rank": 4,
          "document_id": " Implement data governance to manage and secure clients' data\n",
          "text": "NOTE: This content applies to IBM Cloud Pak for Data 4.0 and no subsequent releases.\nData breaches are not the way you want your company to make headlines, and even one data breach can mean an average cost of 3.9 million USD. With data becoming more of a competitive advantage, and with the amount of data that is produced worldwide projected to double between 2010 and 2023, the process of organizing, managing, and securing data is more important than ever. It's no surprise that the tools to take that data, organize it, govern it, and ensure quality and searchability are what you need.\nAs companies begin the process of using their data with AI, many already have the first step of collecting data complete. They store data in a database, and they use that data to inform customers of their previous transactions, conversations, and other information. The second step of organizing the data focuses on creating a foundation based on analytics. More specifically, it's about enabling your data scientists and business analysts to do their job efficiently.\nThis step is where a secure metadata management platform like the IBM Watson Knowledge Catalog on the IBM Cloud Pak for Data platform comes in. At its core, IBM Watson Knowledge Catalog connects data and knowledge with the people who need to use it. Some typical use cases for Watson Knowledge Catalog include ensuring regulatory compliance, data quality management, and data delivery. In this tutorial, I focus on privacy and protection, with the goal of minimizing the chance for a data breach.\nThis hands-on tutorial focuses on showing you how to solve the problems of enterprise data governance on the IBM Cloud Pak for Data platform from the data steward or data administrator persona. I explain how to use governance, data quality, and active policy management to help your organization protect and govern sensitive data, trace data lineage, and manage data lakes. This knowledge can help you quickly discover, curate, categorize, and share data assets, data sets, and analytical models with other members of your organization. It serves as a single source of truth for data engineers, data stewards, data scientists, and business analysts to gain self-service access to data they can trust.You need the admin role to create a catalog, and you begin the tutorial by creating a catalog and loading data.\nSet up catalog and data\nNote: The default catalog is your enterprise catalog. It is created automatically after you install the Watson Knowledge Catalog service and is the only catalog to which advanced data curation tools apply. The default catalog is governed so that data protection rules are enforced. The information assets view shows additional properties of the assets in the default catalog to aid curation. Any subsequent catalogs that you create can be governed or ungoverned, do not have an information assets view, and supply basic data curation tools.\nCreate the catalog\nProvision Watson Knowledge Catalog\nIf you haven't started Watson Knowledge Catalog yet, you'll need to provision it.Open Watson Knowledge Catalog by clicking Services at the upper right of the home page.Click Watson Knowledge Catalog in the Data governance section.Open Watson Knowledge CatalogClick Open to launch Watson Knowledge Catalog.Choose Organize, then All catalogs from the menu on the left.Click either Create catalog or New Catalog from the Your catalogs page.Name your catalog and give it an optional description. Check Enforce data protection rules and click Create.Click OK on the pop-up menu that opens, then click Create.Option 1: Add data assetsFrom the Browse Assets tab, then click here to add your data.You could also click Add to Catalog + in the upper-right, and choose Local files.Clone the following repository: https://github.com/horeaporutiu/wkc-tutorial-intelligent-loan, browse to where you cloned the repository and go to /data/split/applicant_personal_data.csv. Click Open, add an optional description, and click Add",
          "url": "https://developer.ibm.com/blogs//tutorials/implement-data-governance-to-manage-and-secure-clients-data\n",
          "source": "IBM Developer"
        }
      },
      {
        "document": {
          "rank": 5,
          "document_id": "/Upgrading from IBM Cloud Pak for Data Version 4.0",
          "text": "Upgrading from IBM Cloud Pak for Data Version 4.0\nto Version 4.6Last Updated: 2023-06-12\nA Red Hat\nOpenShift Container Platform cluster\nadministrator and project administrator can work together to prepare the cluster and upgrade\nIBM Cloud Pak for Data from Version 4.0 to Version 4.6.\nYour Cloud Pak for Data deployment will be unavailable\nduring the upgrade.\nRestrictions\nYou can upgrade from Cloud Pak for Data Version 4.0 to a subset of the Cloud Pak for Data Version 4.6 releases:\nRefresh\nSupported?\n4.6.0\nYes\n4.6.1\nYes\n4.6.2\nYes\n4.6.3\nNot supported. 4.6.3 requires Red Hat\nOpenShift Container Platform Version 4.10, which is not supported\nwith Cloud Pak for Data Version 4.0.\nYou must\nupgrade to 4.6.2 before you upgrade to 4.6.3.\n4.6.4\nNot supported. 4.6.4 requires Red Hat\nOpenShift Container Platform Version 4.10 or 4.12, which are not\nsupported with Cloud Pak for Data Version 4.0.\nYou\nmust upgrade to 4.6.2 before you upgrade to 4.6.4.\n4.6.5\nNot supported. 4.6.5 requires Red Hat\nOpenShift Container Platform Version 4.10 or 4.12, which are not\nsupported with Cloud Pak for Data Version 4.0.\nYou\nmust upgrade to 4.6.2 before you upgrade to 4.6.5.\n4.6.6\nNot supported. 4.6.6 requires Red Hat\nOpenShift Container Platform Version 4.10 or 4.12, which are not\nsupported with Cloud Pak for Data Version 4.0.\nYou\nmust upgrade to 4.6.2 before you upgrade to 4.6.6.\nBefore you begin\nBefore you upgrade Cloud Pak for Data:\nEnsure that you are running Red Hat\nOpenShift Container Platform Version 4.8, which is supported by Cloud Pak for Data\nVersion 4.0 and Cloud Pak for Data Version 4.6.0 - 4.6.2.\nReview the information in the Planning section.Specifically,\nensure that you review the System requirements. Your cluster must have sufficient resources.\nIf your cluster pulls images from the IBM Entitled\nRegistry, ensure that your cluster uses catalog sources that pull specific versions of images from the\nIBM Entitled\nRegistry.\nImportant: If your cluster\nuses the IBM Operator Catalog, you must migrate from the IBM Operator Catalog. You cannot upgrade to\nCloud Pak for Data Version 4.6 if\nyou want to continue using the IBM Operator Catalog.\nDetermine which install plan the IBM\nCloud Pak foundational services operators and Cloud Pak for Data operators are using:oc get installplanCopied!\nIf the install plan approval is Automatic, you can proceed to the next\nstep.\nIf the install plan approval is Manual, review the following\noptions:\nOption\nDetails\nChange the install plan to Automatic \nIt is strongly recommended that you change the install plan for the IBM\nCloud Pak foundational services operators and Cloud Pak for Data operators to Automatic. This\nenables the cpd-cli\nmanage commands to seamlessly update the operators.To update the\ninstall plan for the operators:\nFor the IBM\nCloud Pak foundational services\noperators, see the Changing approval strategy from Manual to\nAutomatic in the IBM\nCloud Pak foundational services documentation.\nFor the Cloud Pak for Data operators, update the\ninstall plans for each operator through the Red Hat\nOpenShift Container Platform console. Open each subscription, view\nthe subscription details, and edit the Update approval setting. For a list of\nthe Cloud Pak for Data operators, see Creating operator subscriptions in the IBM\nCloud Pak for Data Version 4.0 documentation.\nImportant: Ensure that the install plan of all the operators in the $ project and $ project are set to Automatic. If any of the install plans are set to\nManual, Operator Lifecycle Manager  will automatically update the\ninstall plans to Manual when you run the cpd-cli\nmanage\napply-olm command.\nLeave the install plan as Manual\nYou can optionally leave the install plan for the IBM\nCloud Pak foundational services operators and Cloud Pak for Data operators Manual. \nImportant: If you choose this option, you must watch the install plans and manually approve\nthem during the upgrade to ensure that the cpd-cli\nmanage\napply-olm commands complete successfully.Additionally, you\nm",
          "url": "URL:https://www.ibm.com/docs/en/cloud-paks/cp-data/4.6.x?topic=upgrading-from-cloud-pak-data-version-40",
          "source": "IBM Product Doc Inner"
        }
      },
      {
        "document": {
          "rank": 6,
          "document_id": "A guide to IBM’s complete set of data & AI tools and services",
          "text": "A guide to IBMs complete set of data & AI tools andservices So you can be a fearless wielder ofAI Jennifer Aue \n Follow \n IBM Design \n -- \n Listen \n Share \n Before you scroll down through this page and think There is no way Im getting into all this, no maam, just hold up a sec and let me tell you one, er four, very quick things first: \n As someone who was trained in Swiss grids, rubylith, and letterpress  and has had to make those archaic skills work through 20+ years of design jobs evolving from print, branding, websites, apps, platform strategies, software and most recently AI  I 100% promise if you, if you follow these tried-and-true rules, youll be fine no matter how much of the tech stuff you do or dont absorb. Ready? Burn this into your brain: \n No matter what new technology comes along \n And the one tip Ill add to this list specifically for AI \n All this said, the more of the tech stuff you DO understand, the more confidently youll be able to push your concepts and your team to build better, cooler AI features. So here it is my design lovelies \n the shortest, sweetest summary of all of IBMs data & AI tools and services and what you can use them for that youll find on the interwebs as of 3:27pm, Monday, October 12th, 2020! \n What were about to cover:The AI LadderCloud Pak for DataWatson Studio   Watson Natural Language Classifier in Watson Studio  Data Refinery in Watson Studio  Cognos Dashboard Embedded Service in Watson StudioWatson AssistantWatson DiscoveryWatson Natural Language UnderstandingWatson OpenScaleWatson Knowledge CatalogWatson AIOps \n Recommended links  Note: The vast majority of this content is consolidated and simplified from other sources that Ive pointed to in the Recommended Links sections. Cant take any credit for real authorship here, just a clean-up for your speed learning enjoyment :) \n Lets start from the top with a quick pass on IBMs AI strategy  the AI Ladder. Its how IBMs data and AI tools and services are organized by the steps it takes to build and manage AI, so its easier to understand what services to use when. \n from The AI Ladder, by Rob Thomas \n The AI Ladder is an information architecture designed for AI that allows businesses to automate and govern the data and AI lifecycle with a unified approach, so that they can ultimately operationalize AI with trust and transparency. \n The AI Ladder has been developed by IBM to provide organizations with an understanding of where they are in their AI journey as well as a framework for helping them determine where they need to focus. It is a guiding principle for organizations to transform their business by providing four key areas to consider: how they collect data, organize data, analyze data, and then ultimately infuse AI into their organization. \n Breaking an AI strategy down into pieces  or rungs of a ladder  serves as a guiding principle for organizations, regardless of where they are on their journey. It allows them to simplify and automate how they turn data into insights by unifying the collection, organization and analysis of data, regardless of where it lives. By using the ladder to AI as a guiding framework, enterprises can build the foundation for a governed, efficient, agile, and future-proof approach to AI. \n Recommended links Awesome visualization of the AI Ladder story IBMs Journey to AI Blog The AI Ladder, by Rob Thomas IBM Analytics: Data & AIs products and client stories told in AI Ladder terms IBM Knowledge Center How Cloud Platforms Works How Watson Works \n Where the AI Ladder lives  Cloud Pak for Data holds all of the services that let companies collect, prep, build, connect, deploy, analyze and monitor their data and AI implementations. \n An insight platform that combines data management with data science / AI development. Gives organizations the capabilities to take advantage of a broad set of data and AI services and integrate them into applications to accelerate time to value, time to insight, and time to market. The sys",
          "url": "https://medium.com/design-ibm/a-guide-to-ibms-complete-set-of-data-ai-tools-and-services-29662433ad07?source=collection_archive---------1-----------------------",
          "source": "Medium"
        }
      },
      {
        "document": {
          "rank": 7,
          "document_id": "A guide to IBM’s complete set of data & AI tools and services",
          "text": "A guide to IBMs complete set of data & AI tools andservices So you can be a fearless wielder ofAI Jennifer Aue \n Follow \n IBM Design \n -- \n Listen \n Share \n Before you scroll down through this page and think There is no way Im getting into all this, no maam, just hold up a sec and let me tell you one, er four, very quick things first: \n As someone who was trained in Swiss grids, rubylith, and letterpress  and has had to make those archaic skills work through 20+ years of design jobs evolving from print, branding, websites, apps, platform strategies, software and most recently AI  I 100% promise if you, if you follow these tried-and-true rules, youll be fine no matter how much of the tech stuff you do or dont absorb. Ready? Burn this into your brain: \n No matter what new technology comes along \n And the one tip Ill add to this list specifically for AI \n All this said, the more of the tech stuff you DO understand, the more confidently youll be able to push your concepts and your team to build better, cooler AI features. So here it is my design lovelies \n the shortest, sweetest summary of all of IBMs data & AI tools and services and what you can use them for that youll find on the interwebs as of 3:27pm, Monday, October 12th, 2020! \n What were about to cover:The AI LadderCloud Pak for DataWatson Studio   Watson Natural Language Classifier in Watson Studio  Data Refinery in Watson Studio  Cognos Dashboard Embedded Service in Watson StudioWatson AssistantWatson DiscoveryWatson Natural Language UnderstandingWatson OpenScaleWatson Knowledge CatalogWatson AIOps \n Recommended links  Note: The vast majority of this content is consolidated and simplified from other sources that Ive pointed to in the Recommended Links sections. Cant take any credit for real authorship here, just a clean-up for your speed learning enjoyment :) \n Lets start from the top with a quick pass on IBMs AI strategy  the AI Ladder. Its how IBMs data and AI tools and services are organized by the steps it takes to build and manage AI, so its easier to understand what services to use when. \n from The AI Ladder, by Rob Thomas \n The AI Ladder is an information architecture designed for AI that allows businesses to automate and govern the data and AI lifecycle with a unified approach, so that they can ultimately operationalize AI with trust and transparency. \n The AI Ladder has been developed by IBM to provide organizations with an understanding of where they are in their AI journey as well as a framework for helping them determine where they need to focus. It is a guiding principle for organizations to transform their business by providing four key areas to consider: how they collect data, organize data, analyze data, and then ultimately infuse AI into their organization. \n Breaking an AI strategy down into pieces  or rungs of a ladder  serves as a guiding principle for organizations, regardless of where they are on their journey. It allows them to simplify and automate how they turn data into insights by unifying the collection, organization and analysis of data, regardless of where it lives. By using the ladder to AI as a guiding framework, enterprises can build the foundation for a governed, efficient, agile, and future-proof approach to AI. \n Recommended links Awesome visualization of the AI Ladder story IBMs Journey to AI Blog The AI Ladder, by Rob Thomas IBM Analytics: Data & AIs products and client stories told in AI Ladder terms IBM Knowledge Center How Cloud Platforms Works How Watson Works \n Where the AI Ladder lives  Cloud Pak for Data holds all of the services that let companies collect, prep, build, connect, deploy, analyze and monitor their data and AI implementations. \n An insight platform that combines data management with data science / AI development. Gives organizations the capabilities to take advantage of a broad set of data and AI services and integrate them into applications to accelerate time to value, time to insight, and time to market. The sys",
          "url": "https://medium.com/design-ibm/a-guide-to-ibms-complete-set-of-data-ai-tools-and-services-29662433ad07?source=collection_archive---------1-----------------------",
          "source": "Medium"
        }
      },
      {
        "document": {
          "rank": 8,
          "document_id": "A guide to IBM’s complete set of data & AI tools and services",
          "text": "A guide to IBMs complete set of data & AI tools andservices So you can be a fearless wielder ofAI Jennifer Aue \n Follow \n IBM Design \n -- \n Listen \n Share \n Before you scroll down through this page and think There is no way Im getting into all this, no maam, just hold up a sec and let me tell you one, er four, very quick things first: \n As someone who was trained in Swiss grids, rubylith, and letterpress  and has had to make those archaic skills work through 20+ years of design jobs evolving from print, branding, websites, apps, platform strategies, software and most recently AI  I 100% promise if you, if you follow these tried-and-true rules, youll be fine no matter how much of the tech stuff you do or dont absorb. Ready? Burn this into your brain: \n No matter what new technology comes along \n And the one tip Ill add to this list specifically for AI \n All this said, the more of the tech stuff you DO understand, the more confidently youll be able to push your concepts and your team to build better, cooler AI features. So here it is my design lovelies \n the shortest, sweetest summary of all of IBMs data & AI tools and services and what you can use them for that youll find on the interwebs as of 3:27pm, Monday, October 12th, 2020! \n What were about to cover:The AI LadderCloud Pak for DataWatson Studio   Watson Natural Language Classifier in Watson Studio  Data Refinery in Watson Studio  Cognos Dashboard Embedded Service in Watson StudioWatson AssistantWatson DiscoveryWatson Natural Language UnderstandingWatson OpenScaleWatson Knowledge CatalogWatson AIOps \n Recommended links  Note: The vast majority of this content is consolidated and simplified from other sources that Ive pointed to in the Recommended Links sections. Cant take any credit for real authorship here, just a clean-up for your speed learning enjoyment :) \n Lets start from the top with a quick pass on IBMs AI strategy  the AI Ladder. Its how IBMs data and AI tools and services are organized by the steps it takes to build and manage AI, so its easier to understand what services to use when. \n from The AI Ladder, by Rob Thomas \n The AI Ladder is an information architecture designed for AI that allows businesses to automate and govern the data and AI lifecycle with a unified approach, so that they can ultimately operationalize AI with trust and transparency. \n The AI Ladder has been developed by IBM to provide organizations with an understanding of where they are in their AI journey as well as a framework for helping them determine where they need to focus. It is a guiding principle for organizations to transform their business by providing four key areas to consider: how they collect data, organize data, analyze data, and then ultimately infuse AI into their organization. \n Breaking an AI strategy down into pieces  or rungs of a ladder  serves as a guiding principle for organizations, regardless of where they are on their journey. It allows them to simplify and automate how they turn data into insights by unifying the collection, organization and analysis of data, regardless of where it lives. By using the ladder to AI as a guiding framework, enterprises can build the foundation for a governed, efficient, agile, and future-proof approach to AI. \n Recommended links Awesome visualization of the AI Ladder story IBMs Journey to AI Blog The AI Ladder, by Rob Thomas IBM Analytics: Data & AIs products and client stories told in AI Ladder terms IBM Knowledge Center How Cloud Platforms Works How Watson Works \n Where the AI Ladder lives  Cloud Pak for Data holds all of the services that let companies collect, prep, build, connect, deploy, analyze and monitor their data and AI implementations. \n An insight platform that combines data management with data science / AI development. Gives organizations the capabilities to take advantage of a broad set of data and AI services and integrate them into applications to accelerate time to value, time to insight, and time to market. The sys",
          "url": "https://medium.com/design-ibm/a-guide-to-ibms-complete-set-of-data-ai-tools-and-services-29662433ad07?source=collection_archive---------1-----------------------",
          "source": "Medium"
        }
      },
      {
        "document": {
          "rank": 9,
          "document_id": "A guide to IBM’s complete set of data & AI tools and services",
          "text": "A guide to IBMs complete set of data & AI tools andservices So you can be a fearless wielder ofAI Jennifer Aue \n Follow \n IBM Design \n -- \n Listen \n Share \n Before you scroll down through this page and think There is no way Im getting into all this, no maam, just hold up a sec and let me tell you one, er four, very quick things first: \n As someone who was trained in Swiss grids, rubylith, and letterpress  and has had to make those archaic skills work through 20+ years of design jobs evolving from print, branding, websites, apps, platform strategies, software and most recently AI  I 100% promise if you, if you follow these tried-and-true rules, youll be fine no matter how much of the tech stuff you do or dont absorb. Ready? Burn this into your brain: \n No matter what new technology comes along \n And the one tip Ill add to this list specifically for AI \n All this said, the more of the tech stuff you DO understand, the more confidently youll be able to push your concepts and your team to build better, cooler AI features. So here it is my design lovelies \n the shortest, sweetest summary of all of IBMs data & AI tools and services and what you can use them for that youll find on the interwebs as of 3:27pm, Monday, October 12th, 2020! \n What were about to cover:The AI LadderCloud Pak for DataWatson Studio   Watson Natural Language Classifier in Watson Studio  Data Refinery in Watson Studio  Cognos Dashboard Embedded Service in Watson StudioWatson AssistantWatson DiscoveryWatson Natural Language UnderstandingWatson OpenScaleWatson Knowledge CatalogWatson AIOps \n Recommended links  Note: The vast majority of this content is consolidated and simplified from other sources that Ive pointed to in the Recommended Links sections. Cant take any credit for real authorship here, just a clean-up for your speed learning enjoyment :) \n Lets start from the top with a quick pass on IBMs AI strategy  the AI Ladder. Its how IBMs data and AI tools and services are organized by the steps it takes to build and manage AI, so its easier to understand what services to use when. \n from The AI Ladder, by Rob Thomas \n The AI Ladder is an information architecture designed for AI that allows businesses to automate and govern the data and AI lifecycle with a unified approach, so that they can ultimately operationalize AI with trust and transparency. \n The AI Ladder has been developed by IBM to provide organizations with an understanding of where they are in their AI journey as well as a framework for helping them determine where they need to focus. It is a guiding principle for organizations to transform their business by providing four key areas to consider: how they collect data, organize data, analyze data, and then ultimately infuse AI into their organization. \n Breaking an AI strategy down into pieces  or rungs of a ladder  serves as a guiding principle for organizations, regardless of where they are on their journey. It allows them to simplify and automate how they turn data into insights by unifying the collection, organization and analysis of data, regardless of where it lives. By using the ladder to AI as a guiding framework, enterprises can build the foundation for a governed, efficient, agile, and future-proof approach to AI. \n Recommended links Awesome visualization of the AI Ladder story IBMs Journey to AI Blog The AI Ladder, by Rob Thomas IBM Analytics: Data & AIs products and client stories told in AI Ladder terms IBM Knowledge Center How Cloud Platforms Works How Watson Works \n Where the AI Ladder lives  Cloud Pak for Data holds all of the services that let companies collect, prep, build, connect, deploy, analyze and monitor their data and AI implementations. \n An insight platform that combines data management with data science / AI development. Gives organizations the capabilities to take advantage of a broad set of data and AI services and integrate them into applications to accelerate time to value, time to insight, and time to market. The sys",
          "url": "https://medium.com/design-ibm/a-guide-to-ibms-complete-set-of-data-ai-tools-and-services-29662433ad07?source=collection_archive---------1-----------------------",
          "source": "Medium"
        }
      }
    ],
    "source_url": "https://medium.com/icp-for-data/db2-data-gate-blog-series-part-2-ibm-db2-for-zos-data-gate-ibm-watson-knowledge-catalog-55b208a5c0f1?source=collection_archive---------3-----------------------",
    "reranker_load_time": 10.665661656996235,
    "success": {
      "model_name": "google/ul2",
      "model_load_time": 14.22489712596871,
      "prompt": "Answer the question based on the context below.                         Context: Db2 Data Gate blog series part 2: IBM Db2 for zOS Data Gate IBM Watson Knowledge Catalog integration - Sowmya Kameswaran \n Follow \n Cloud Pak for Data \n -- \n Listen \n Share \n By Vassil Dimov, Mateo Toi, Sowmya Kameswaran and Eirini Kalogeiton \n Introduction \n In the previous blog post we have introduced Db2 for z/OS Data Gate and described its main purpose to synchronize Db2 for z/OS data in the cloud. In this blog we will discuss how besides of the synchronization of the actual data Db2 Data Gate takes care of the corresponding metadata through the integration with Watson Knowledge Catalog. Additionally, we will highlight the business value of the integration. \n AI Ladder \n In the current business world, modernization of data and use of AI is the key to success. The guiding principles of the AI ladder defined by IBM, help organizations with business transformation based on the four key areas mentioned below: \n In this blog we will discuss how Db2 Data Gate and Watson Knowledge Catalog, representing the first two pillars of the AI Ladder, can help organizations to unlock the huge value of their Z data in the cloud. \n About Db2 Data Gate \n Db2 Data Gate enables modern high-volume, high-frequency hybrid cloud applications that need read-only access to valuable enterprise data from Db2 for z/OS. It plays a key role in the Collect pillar by enabling movement of data from Db2 for z/OS into the Cloud Pak for Data platform. With data synchronization between source Db2 for z/OS and target IBM Db2 and IBM Db2 Warehouse, applications are able to get access to current data. To learn more about IBM Db2 for z/OS Data gate, please read What is Db2 Data Gate? Db2 Data Gate Blog Series Part 1 \n About Watson Knowledge Catalog \n Watson Knowledge Catalog  is an enterprise data catalog management platform that forms the core of the Organize pillar of the Cloud Pak for Data platform. A catalog connects people to the data and knowledge that they need. It is the key enabler to building the enterprise data catalog on Cloud Pak for Data that enables platform users to find, prepare, understand, and use the data as needed. The data governance framework ensures that data access and data quality are compliant with your business rules and standards. \n WKC unites all information assets into a single metadata-rich catalog, based on Watsons understanding of relationships between assets and how theyre being used and socialized among users in existing projects. It is integrated with an enterprise data governance platform that merges the analytics capabilities of Watson Studio. The data catalog assists data scientists in easily finding, preparing, understanding and using the data as needed. \n Data protection has gained importance in recent years. That is why it is so important that WKC protects data from misuse and enables sharing of assets with automated, dynamic masking of sensitive data elements. This avoids violating various data protection regulations. For instance, when handling healthcare data in the USA, companies need to be aware of HIPAA , a set of rules on how personally identifiable information maintained by the healthcare and healthcare insurance industries should be protected from fraud and theft. Moreover, any company based in the EU or offering services to people in the EU must comply with GDPR , which has a much broader scope and governs the use of all personal data. \n Db2 Data Gate and Data Fabric \n Db2 for z/OS data is core to many operational applications but also foundational to business insight. Db2 for z/OS data is some of the most indispensable within an organization for a data fabric implementation. The end-to-end Db2 Data Gate solution makes data available and synchronized for end user access within a data fabric. As compared to alternative approaches, Db2 Data Gate can be simpler, less expensive, and timelier. With Db2 Data Gate organizations can go from transaction to business action in moments. \n Db2 Data Gate 2.1  Watson Knowledge C                         Question: What is watson knowledge catalog?"
    }
  },
  "raw_answer": "Answer: Watson Knowledge Catalog is an enterprise data catalog management platform that forms the core of the Organize pillar of the Cloud Pak for Data platform. It is the key enabler to building the enterprise data catalog on Cloud Pak for Data that enables platform users to find, prepare, understand, and use the data as needed. The data governance framework ensures that data access and data quality are compliant with your business rules and standards. Db2 Data Gate Blog Series Part 1 Db2 Data Gate and Data Governance The integration of Db2 Data Gate and Watson Knowledge Catalog enables organizations to easily find, prepare, understand, and use the data as needed. The data governance framework ensures that data access and data quality are compliant with your business rules and standards. The data catalog assists data scientists in easily finding, preparing, understanding and using the data as needed. Db2 Data Gate Blog Series Part 1 Db2 Data Gate and Data Fabric D",
  "question": "What is watson knowledge catalog?",
  "request_time": {
    "$date": "2023-08-03T15:03:16.960Z"
  },
  "answer": "Watson Knowledge Catalog is an enterprise data catalog management platform that forms the core of the Organize pillar of the Cloud Pak for Data platform. It is the key enabler to building the enterprise data catalog on Cloud Pak for Data that enables platform users to find, prepare, understand, and use the data as needed. The data governance framework ensures that data access and data quality are compliant with your business rules and standards. Db2 Data Gate Blog Series Part 1 Db2 Data Gate and Data Governance The integration of Db2 Data Gate and Watson Knowledge Catalog enables organizations to easily find, prepare, understand, and use the data as needed. The data catalog assists data scientists in easily finding, preparing, understanding and using the data as needed.",
  "type": "single_model",
  "retiever": "elastic",
  "requestBy": "htalukder@ibm.com",
}

A well-structured debugging pipeline is essential for maintaining a high level of quality and ensuring that your software functions as expected. In this blog, we’ve explored the implementation of a debugging pipeline without the dependency on external third-party libraries or software. Keep in mind that debugging is both an art and a science, and ongoing learning and adaptation play a pivotal role in taking the full utility of your debugging pipeline.

The full implementation details can be found in this GitHub Repo.

Follow Towards Generative AI for more content related to latest in AI advancement.

Debugging RAG Pipeline

SuperKnowa offers a comprehensive debug pipeline to analyze performance and logging for a Retrieval-Augmented Generation (RAG) pipeline.

Adding Checkpoints for RAG Debugging

Debug Logs Dataset

Written by Himadri Talukder