Unite Data Governance and AI Governance with IBM Knowledge Catalog

Kashif Hafeez
6 min readFeb 29, 2024

--

Authors: Kashif Hafeez, Julie Forgo

Following the implementation of data regulations such as GDPR and CCPA, organizations got serious about Data Governance, learning to safeguard private and sensitive data. IBM was there to support the implementation of data governance strategies with IBM Knowledge Catalog, providing tools to classify data and establish policies to meet compliance goals.

Now, with the rapid growth of AI in business operations and processes, organizations are wrestling with another governance challenge, and once again, IBM is here to help. AI Governance is the practice of ensuring the safe and responsible development, implementation, and monitoring of AI systems. Although the challenge can seem daunting, IBM is here to help you to embrace the transformative power of AI while monitoring and mitigating risks such as bias, data provenance, and other potential pitfalls.

This blog post describes how the data governance capabilities of IBM Knowledge Catalog can work in concert with other IBM solutions to help you manage both data and AI governance in an integrated approach.

What is AI Governance?

AI governance is the ability to direct, manage and monitor the AI activities of an organization. This practice includes policies and processes that trace and document the origin of data, AI models and associated metadata and pipelines for audits. Model documentation is needed for audits to ensure model explainability. Documentation should include the training techniques, hyper-parameters and testing metrics used to create, test, and deploy the models. The purpose of documentation is to increase transparency into the model processes and behaviors throughout the AI lifecycle, maintain a record for approvals and compliance, and track the data used for development and validation.

So, what can IBM Knowledge Catalog provide that helps your organization govern data and AI?

Industry-specific taxonomy with Data Privacy and AI Governance policies

IBM Knowledge accelerators comprise industry specific vocabularies that contain extensive sets of pre-defined industry content for Financial Services, Healthcare, Insurance, and Energy.

IBM Knowledge accelerators contain business terms with classifications of Personal Information (PI) and Sensitive Personal Information (SPI). The data privacy taxonomy of terms is discussed in detail in the blog “Combining Data Governance and Data Privacy with the IBM Knowledge Catalog Data Privacy Accelerator”.

AI Governance policies introduced as part of IBM Knowledge Catalog 4.8, provide a selection of pre-defined templates specifically designed to assist organizations who deploy IBM Knowledge Catalog extend governance to include AI policies. The sample policies are based on feedback from client deployments and IBM’s internal AI Ethics Initiative. Your organization can use these policies to implement governance practices for designing and implementing AI systems.

Tracking AI assets with IBM Knowledge Catalog and AI Factsheets

IBM AI Factsheets extend the capabilities of IBM Knowledge Catalog to track the lifecycle of AI model from request, through development, and on to production. AI Factsheets increase the transparency of the process of development and deployment of the model, and provide the data for approvals, proof of compliance, and archiving.

Before you develop an AI model or solution, start by defining a business use case in IBM Knowledge Catalog or IBM Open Pages. Model use cases are stored in the Model Inventory in IBM Knowledge Catalog.

IBM Cloud Pak for Data

Model approvers or compliance officers can review model use cases in the inventory. In this example, the model inventory contains three uses cases for different applications for a financial organization.

Model Inventory view in IBM Cloud Pak for Data

Let’s take a closer look at the Mortgage Approval use-case. To begin, a business user identifies a need for an AI model and creates a model use case to request a new model. The user assigns a name and adds other related information such as description, model purpose, supporting documentation and model risk. After the use case is defined and approved, collaborators can start associating the assets required for building a solution.

A data engineer can enhance the use-case by adding classifications (Personal Information, Sensitive Personal Information), business terms from an IBM Knowledge Accelerators vocabulary, and the data assets required for the model. Other governance artifacts and assets, such as AI governance policies and governance rules can be added to the use case as Related items.

Now, a data scientist develops the requested model, tracking it in the use case. AI Factsheets record all metadata and activity for the model as it moves through the AI lifecycle, from development to operation.

AI fact sheets capture a defined set of details about the model. These details include model information, training information, training parameters, training metrics, training tags and input schema. Using the provided notebook, an administrator can define custom facts to meet the needs of the project.

In addition to providing the tools to collect the data for the use case, IBM Knowledge Catalog can assist in extracting the data privacy information from the associated schema and add it as “Additional details” to the existing model information by using this notebook.

Custom Facts added in Additional Details

Using the notebook, data privacy classification (PI and SPI) information is extracted from IBM Knowledge accelerator business terms that are assigned to data columns used in the model. This shows another connection between Data Governance and AI Governance. Model validators, auditors, and data protection officers evaluate the model and associated governance policies from the data collected in the model use case.

Visualize the model use case information.

To get another perspective on how data privacy is enforced in model use cases, download the AI Governance and Data Privacy dashboard provided with IBM Knowledge Catalog. The dashboard gives you a visual representation of where and how data tagged as personal information is used in the model use cases. The dashboard also provides information about the governance policies and rules associated with the data being used by model use cases.

Pre-defined dashboard has three tabs:

AI and Data Privacy: provides an overall view of the model use cases, PI and SPI classification, risk levels and the data used. ­

PI and SPI Data: provides a drill down showing the classifications at column level. The table also shows the IBM Knowledge Accelerator business term assignments to data columns. You can use filters to view the data associated with each model use case.

AI Model and Governance Policies & Rules provides an overview of the policies and rules associated with each model use case. You can apply filter to focus on a specific policy.

Summary

The Knowledge Accelerators provided with IBM Knowledge Catalog provide a set of governance artifacts to help your organization integrate and advance your Data Governance and AI Governance processes. Extend the Knowledge Catalog capabilities with IBM AI Factsheets to define use-cases that include monitoring for data privacy. Let IBM help you to develop, maintain, and monitor AI solutions that are responsible, trustworthy, and transparent.

--

--