Governance Workflows — The Key to Building Successful Information Architectures

Namit Kabra
IBM Data Science in Practice
9 min readFeb 15, 2021

It is said there is no AI without IA (information architecture). Most enterprises want to use the best and newest AI approaches in their business to make use of all the data they have. Unfortunately, many are not successful. Why?

One of the main reasons for this failure to implement successful AI in the enterprise space is that the data governance within their organizations is handled poorly. Organizations don’t understand their own data or know its structure. The companies can hire data scientists, but if these data scientists can’t find the right data or understand the data they are given, they cannot make use of it in their models. We developed Watson Knowledge Catalog to empower companies to take charge of their data and use it for transformational AI in the enterprise. In this post, we will cover the workings of the Watson Knowledge Catalog and give a walkthrough of how to use it.

diagram of uses of a data scientist’s time
Without business ready data foundation, Data Scientists spends 80% of their time finding and prepping data

What is a knowledge catalog?

First, let’s discuss what a knowledge catalog is. Suppose you want to gather information in a library or find a particular book in a library. You don’t randomly pick through all the books in the library and you shouldn’t have to because the books are all catalogued according to topics. If you want to gather knowledge about fiction books, for instance, you know which section to go to and gather that information. A knowledge catalog is similar — all of the data that a business has can be organized and labeled. This understanding of an organization’s data is the first step towards a successful data and AI project.

There are many business advantages to having a knowledge catalog. Among these are timeliness and trust. When you have a knowledge catalog, this can decrease time to results as you can quickly gather your data and have more time to analyze the data and to put it into use. Most importantly, a knowledge catalog also allows for greater understanding of the data itself. Knowing the contextual asset information of data allows for data scientists and project managers to use it in many more situations than it would otherwise. Customers and clients can trust the data stored and managed with a knowledge catalog because it allows for the tracking of data lineage and more secure storage from in-depth understanding. In the same way, this trust and tracking allows for greatly improved data governance practices in an organization and greatly eases regulatory compliance of data assets for companies and other organizations in regulated industries such as finance and health.

Building the knowledge catalog

visualization for the five iterative steps of building a taxonomy in the Watson Knowledge Catalog
Steps for building the Knowledge Catalog

There are five steps to building a knowledge catalog. The first is very crucial. This is where you define business terms for your data assets. As an example, let’s say you are in the finance industry: you have many specific types of terms, such as those used in billing, loans, securities, and so forth, which are used often. These critical data elements are your business terms. To create these, you give details, short descriptions, and longer descriptions as well. The next step is to create data classes. There are many out of the box data classes in the Watson Knowledge Catalog that you can use, but you can define your own as well. At this point, you then link both the predefined and custom classes with your business terms.

After this, you then create data quality dimensions. There are eleven pre-defined data quality dimensions, and as with the data classes, any user can also create their own custom dimensions. After this is complete, you can create your automation rules. One example of this is to define rules for securing sensitive data, such as a customer’s PII. Once all of these steps are complete, a user can run auto discovery, which allows for automated data ingestion, classification and assignment. Auto discovery gives Watson Knowledge Catalog much of its power. This allows a data scientist to get their data and understand the business terms and particular qualities for the data they are querying.

Workflow

diagram of a data workflow process
Data Citizens work together to build business taxonomy

Now, we come to how a workflow for creating a new data asset in Watson Knowledge Catalog works. Note example of these data assets are Business Terms, Data Classes, Business Rules and Policies, etc. As you can see above, the data governance officer or equivalent in an organization can identify the focus areas, sponsors and key stakeholders, as well as the data steward(s). This stewardship team then goes and defines a workflow for a taxonomy and configures it inside of WKC and assigns roles. These roles include those who define terms, who edits, who reviews, and who approves. Additionally, collaborators who do no work but who can share and see information with the rest of the team can be defined.

To create a proper catalog of governed data, the process to its creation has to be streamlined. This streamlining is achieved using workflows.

a busy multilevel interwoven highway exchange
Workflows — Streamlining the creation, modification and deletion of governance artifacts

These workflows enforce a task-based process to control the creating, modifying, and deleting of governance artifacts. Each type of artifact can be assigned to a workflow that defines the sequence of steps that must be completed before an artifact is published and available to use. The steps can include authoring, approving, reviewing, and publishing. A set of users are assigned to work on each step. You can claim a task to indicate that you are working on the task. If necessary, you can return a claimed task so that another assignee can claim it. After a draft artifact is published, all users can view it.

A workflow administrator can create a workflow and assigns roles to individuals as described above. In the image below, you can see that the administrator can choose an individual and assign them authority to specific workflow tasks, such as approver and reviewer.

screenshot of managing users page
Assigning Roles

To create a workflow for governance artifacts, we go to the Workflow Management page

screenshot of creating a workflow using a template
Create a workflow using a template

The workflow administrator can go here and select which type of template they would like to choose from. Currently, there are three templates: 1) one approval step and one review (or publishing) step, 2) automatic publishing, and 3) two approval steps and one review (or publishing) step.

screenshot of creating properties and artifact types
Select template for workflow and select artifact types

Once an administrator selects a template (in this case, the template with two approval steps and one review/publish step), we select the artifact types in the workflow we are creating (in this case: business terms and governance rules). Once we save this, we go to the next page, the workflow configuration details page. This is the page where we will add all the assignees. Once this is activated, the workflow is enabled.

screenshot of enabling a workflow
Enabling a workflow

For more details on each role and other actions, please see https://www.ibm.com/support/producthub/icpdata/docs/content/SSQNUZ_latest/wsj/governance/workflow.html.

Once the workflow is enabled for a Business Term, for example, the next time when the Business Term is created, the workflow will get triggered and publishing the Business Term will require the approvals from the authorized users. Let me walk you through this term creation process in one approval step and one review (or publishing) step workflow.

Step 1:
From the Business terms page, click “New business term”. Enter a business term name, its primary category and an optional description. Click on the “Save as draft” button to create a draft of this new Business term.

screenshot of how to create a new business term
Step 1: Click “New business term”

Step 2:
On the draft page, you can edit the details of the Term and even add stewards for the asset and send it for approval by clicking on the “Send for approval” button.

screenshot of editing details on a task
Step 2: Edit details

Step 3:
You can enter an optional comment and an optional due date. Beyond the due date the stewards will get an overdue notification to complete the task.

screenshot of confirming a task is to be sent for approval
Step 3: Send for approval

Step 4:
Once the User sends the draft for approval, the authorized stewards get notified through mail notification.

Step 4: Get the notification via mail

Step 5:
The authorized stewards also get a bell notification on their dashboard indicating that they have a task that requires their attention.

screenshot of notifications page
Step 5: Get notification on the dash board

Step 6:
The Data Steward can go ahead and claim the task and can work on it.

screenshot of user claiming the task
Step 6: Authorized user claims the task

Step 7:
Once the task is claimed, the steward can either work on the task or can also return the task to the task pool.

screenshot of returning a task for approval
Step 7 : Return the task on Approve it

Step 8:
Since it is a one step approval workflow, the steward can choose to verify and publish the Business Term as shown below.

screenshot of the pre-publish page of a business term
Step 8: Publish the governance artifact to make it available to the Knowledge Catalog

Step 9:
Now when we sort the Business terms based on Last modified, we can see the term “Capital Gains” that is published.

screenshot of a published asset
Step 9: View the Published asset

Step 10:
You can now click on the newly published term and can view its details.

screenshot of published asset in Workflow
Step 10: View details of Published asset

Workflow use cases
Workflow by organization areas. In any large organization, allowing for control of data creation and sharing of data is an enormous task. In any large organization can enable departments to create their own unique categories and their own workflows with terms that are totally under their own control and not another part of another department’s. However, by defining others in different departments and sections of the organization as collaborators, well-maintained and defined data can easily be shared within the larger organization.

Controlling data access. If a data consumer wants access to a data asset that has restricted access in the organization, the access can be granted to their project after the approval process.

Data assets. If an organization requires changes in metadata on data assets (say re-assigning Business terms or Data Classes), it can be managed through workflow and will automatically make approval and/or review mandatory.

Parallel steps in workflow. Instead of sequential steps requiring one reviewer after another, a workflow can be set up where a list of approvers receive their task at the same time.

Request management. A user can set up any type of custom request process that does not require any automation on the platform or a change.

Wrap up
We’re excited about the possibilities for organizations to grow their AI capabilities by the improved data governance offered by Watson Knowledge Catalog and Workflow. You can fundamentally improve your organization’s data possibilities and data trustworthiness with management of data histories, definitions, and usage that this tool enables. To learn more about WKC, and other projects, please visit https://www.ibm.com/cloud/watson-knowledge-catalog.

--

--

Namit Kabra
IBM Data Science in Practice

Namit Kabra is a Software Developer for the IBM Cloud and Cognitive Software. For more, visit his personal website: https://namitkabra.wordpress.com/about/