A methodology for customizing pre-defined business vocabularies in IBM Knowledge Catalog

Pat O'Sullivan
7 min readDec 15, 2023

--

Photo by Erol Ahmed on Unsplash

Authors : Pat O’Sullivan and Julie Forgo

Before an enterprise can plan a governance strategy, it must start by making sure that people in the organization are quite literally speaking the same language. One way to do this is to develop a list of industry terms to use as a standardized vocabulary. While this can deliver great benefits, it can be costly and time-consuming to build a vocabulary from scratch. Instead, accelerate your path to your specific governance vocabularies by customizing one of the predefined, domain-specific vocabularies offered by IBM Knowledge Catalog (IKC).

The IBM Knowledge Accelerators are a set of extensive industry-wide vocabularies of business terms which your organization can use as a basis to build out your own enterprise vocabulary. In order to ensure that the broad range of a particular industry is covered, these vocabularies can be quite extensive, often consisting of more than 10,000 business terms.

While this provides a deep range of possible business content, the challenge for many organizations is how to practically get started with such large vocabularies. In particular, many organizations prefer to start with a small subset of glossaries to address the specific business topic or domain that is the focus of their initial data governance initiative. This blog post steps you through how to tailor a Knowledge Accelerator vocabulary for your requirements.

Even when you plan to start small, a benefit of using a full enterprise-wide vocabulary such as those provided by the Knowledge Accelerators is that you can approach governance incrementally and build out vocabularies over time. The use of such a “canonical” vocabulary means that you can ensure that an enterprise-wide perspective remains, even when the initial focus is on building the vocabulary for a particular business area or domain. This is particularly important when it comes to being able to grow the vocabulary to incorporate other business areas over time.

Why use industry wide vocabularies ?

Most industries have a range of possible sources of standard vocabularies. An enterprise might choose (or be required) to adopt an approved vocabulary to underpin the creation of the business metadata for their Data Governance or Data Fabric initiatives. In addition, some industries might have a number of different standards that could be used as a source of possible candidate business terms.

Creating a business vocabulary

The diagram above shows a typical lifecycle flow for developing an approved set of business terms. In a typical scenario, you might start with the thousands of potential terms that are available from different standards and different vendors. You then must identify a subset of the terms as being potentially relevant to the organization. This collection can then be imported into a catalog as “Development” business terms that are accessible to a relatively small number of data stewards, business analysts, and admin staff. The terms are curated to define the set used for primary data governance or data fabric activities. Finally, terms that are no longer relevant to the day-to-day operations of the business are marked as deprecated.

Building a business vocabulary with IBM Knowledge Catalog

Before you consider how to work with a Knowledge Accelerator, let’s consider how the process of collecting and curating terms happens in IBM Knowledge Catalog.

Setting up “Development” and “Production” Vocabularies.

The first step is to set up the necessary “Development” and “Production” vocabularies in IKC. As shown below, you set up two separate vocabularies. The first is the development vocabulary, only accessible to the data steward and other designated admin users, and the second is the production vocabulary intended for use by the regular business users.

In this example, the data steward gathers vocabulary terms from external sources and combines with terms used in the organization to build the development vocabulary. Because the vocabulary is a work in progress, access is restricted to the group of people who will work to process the terms to create the final product, the production vocabulary.

Note that the production vocabulary is partitioned into categories that represent logical subsets of terms. In the sample, one category is for general core business vocabulary and the other two are department-specific terms. The benefit of categorizing terms is that you can manage access to provide users with access to only the categories they need.

The next figure shows access management for a specific category. Again, you might provide access to all users for the most general terms, such as the terms in the enterprise vocabulary, and restrict access to the more specialized categories.

Identifying the subset of what is required

The next step is for the data steward to work with business users to decide on the set of business terms to copy from the development to the production vocabularies.

The following image shows how to capture a project scope by using the secondary category capability in IKC.

In this case, all 74 business terms from the Knowledge Accelerator for Financial Services that have been deemed suitable for the domain of “Channel Usage” have been assigned that as the secondary category. Terms can be assigned to multiple categories so you can build the tailored vocabularies you need.

As you classify terms and assign them to primary and secondary categories, you can use the IKC Relationship Explorer facility to get a more graphical view of the contents of your business scopes.

The image above shows part of the business terms from the “Channel Usage” scope along with associated relationships. This graphical mapping can help you determine the correctness and completeness of the vocabularies before you move them to production.

Populating the Production vocabulary

The next step is to populate the production vocabularies with the terms from the development vocabularies. You can manually assign them, by changing the primary category to the production category, but

this approach has two possible drawbacks :

Manually going to each business term and changing the primary category can become very laborious and error prone if this has to be repeated for hundreds of business terms.

Moving the business terms means they no longer reside in the development vocabulary and so could adversely impact the structure and completeness of these as reference vocabularies for future initiatives.

Instead, consider using a cloning script that is available for download by IKC clients.

The script can be used to automate the copying (or cloning) of the terms referenced by a particular category into another category. This cloning script leaves the identified business terms in the source development vocabulary, and makes a full copy of these terms , their properties, and any relevant relationships into the indicated target production vocabulary.

You can download the cloning script and the associated install and usage instructions from here

The screen image below shows the results of running a cloning script that automatically cloned all 74 business terms referencing the “Channel Usage” category from development to production.

In addition to cloning the 74 terms in this example, the script also cloned any associated dependent categories (specifically the parent categories for the selected terms). Finally, the category description includes the link to indicate from where this category was cloned from. This simple solution accelerates the creation of the production vocabularies and maintains a full audit trail.

Subsequent use of the Production Vocabulary

Once the relevant terms have been populated, you can then use the terms to support the needs of the relevant areas of the business.

As the illustration shows, certified business users can view the business terms in the production vocabulary. Note that you can also use the IKC Metadata Enrichment tool and continue to work on term assignment in the production vocabularies.

Your production vocabulary will evolve over time with more business terms and asset associations being added as other business topics are addressed. Use the IKC Workflow function to provide the necessary oversight into any artifacts that are proposed for addition to the production vocabulary in subsequent iterations.

Summary

Start with predefined, domain-specific Knowledge Accelerator vocabularies to jumpstart the business vocabularies you need for your organization. Build a robust production vocabulary by curating the terms using the features provided in the IBM Knowledge Catalog such as the separation of vocabularies via categories and the workflow process to enable review and oversight of proposed changes. Leverage downloadable assets such as the cloning script to accelerate the path to a reliable production vocabulary.

For more information, check out IBM Knowledge Accelerators and IBM Knowledge Catalog and this video that walks through detailed examples of the steps described in this blog.

Tags: Data Governance, Methodology, Data Fabric

--

--

Pat O'Sullivan

Senior Technical Staff Member with IBM. A Data Architect with a background in Data Models, Business Glossaries, Data Governance and Data Management.