Build a curated business vocabulary for your data fabric

Pat O'Sullivan
IBM Data Science in Practice
5 min readJun 29, 2022

Photo by Shunya Koide on Unsplash

This blog was co written with Paul Kilroy, IBM Knowledge Accelerators Development Manager

A fundamental building block of setting up any effective data governance or data fabric environment is to have in place a solid layer of business terms. Establishing business terms helps your users locate data assets more efficiently. Having a plan for developing your business terms can save you time and energy and produce a better result for your users.

A typical approach for many organizations embarking on a data governance project is to try to define such a vocabulary of business terms. While this sounds basic, it can be surprisingly time-consuming and frustrating. You say eggplant; I say aubergine — never underestimate the ability of human beings to come up with different words and definitions for what should be the same thing. The more people and groups across the organization that you try to get agreement from such definitions, the more significant the challenge becomes of finding consensus. When the objective is a broader Data Fabric initiative, the issue is not just one of identifying and defining business terms, but can extend to the relationships between business terms. To achieve the best result, your dictionary might require Taxonomies or even Ontologies to achieve a better foundation for the ML/AI/NLP processes this business language must support.

No doubt, while you ponder the scope of your task, the pressure is mounting from the various stakeholders to get on with the whole data governance project. However, without an effective way to build a business language in a way that quickly resolves such differences, you run the risk of the overall data governance program not being fit for purpose because the business users cannot find what they need due to the lack of a complete and coherent business language.

Jump start your business terms list
We have looked at the pitfalls of a poor strategy for curating business terms. Let’s consider some solutions. Imagine having at your fingertips an extensive set of well-defined and structured business terms that you can pick up, customize to your needs, and put into use as your data governance and data discovery activities progress. Good news! With the latest release of Watson Knowledge Catalog, you have just that in the form of the IBM Knowledge Accelerators that provide comprehensive industry-wide business vocabularies specifically designed for these industries:

· Financial Services

· Healthcare

· Insurance

· Energy & Utilities

These business vocabularies of business terms are already pre-integrated with the data classes that are provided by Watson Knowledge Catalog. That means that that these business terms can be automatically assigned to your data assets as part of the metadata enrichment process of Watson Knowledge Catalog. The Knowledge Accelerators also include extensive reference data sets and values from which additional Data Classes can be generated as required.

Business Scopes

Organizations have consistently requested a starter set of terms applicable to their specific business topic or application area. We heard you, and we are ready to deliver. Not only are we making our extensive business vocabularies available in Watson Knowledge Catalog, but we are also including 28 pre-defined business scopes that address specific business topics across different industries. These business scopes include business terms and reference data.

If you see a topic from the list above that sounds close to your business area, then just import that particular business scope. Each of these business scopes contains not more than 500 business terms, enough to give you a good starting point to build your vocabulary but not so many as to be overwhelming. For each industry there is a related set of scopes. When you want to move on to another area, just import another scope. As you import additional scopes, the business terms are integrated.

One other significant addition to the Knowledge Accelerators is the addition of three cross-industry business scopes. These industry-agnostic scopes (Contact Center, Personal Data, and Weather Insights) are intended for use in any industry where these business topics are relevant.

Build your glossary using bite-sized chunks

You can start small, without starting from scratch. Build your collection of terms by importing one business scope to establish your business vocabulary. This will provide a small set of curated business terms and reference data, organized into categories.

You can then extend by adding more business scopes. This will add new categories, new business terms, and new reference data sets to your business vocabulary. It will also add relationships to overlapping terms in your original business scope, so that the business vocabulary is extended in an integrated way.

Exploiting Watson Knowledge Catalog

IBM Watson Knowledge Catalog has a range of capabilities that you can leverage in conjunction with your growing business vocabulary.

Anatomy of a business vocabulary

The initial use of this business vocabulary is for it to be the target of the auto assignment of the data sets to the relevant business terms of the vocabulary. This is carried out based on a combination of underlying data classes and ML semantic matching between the Column name and the business term name and description. Auto assignment helps you efficiently build out an accurate and extensive business layer that describes the underlying data assets.

Because the Knowledge Accelerator business terms are also pre-integrated with key data privacy related classifications such as Personal Information and Sensitive Personal Information, you can also use this as a foundation for establishing data governance rules to control access to personal / sensitive data.

The business vocabulary created from the Knowledge Accelerators consists of a highly structured and richly defined set of business terms that your business users and data scientists can leverage to find and use the data assets they need to drive insights and innovation.

Summary

The IBM Knowledge Accelerators provide users of Watson Knowledge Catalog with the ability to jump-start data governance and data fabric initiatives via an extensive and industry-rich set of business vocabularies. In addition to supporting the core data discovery/data governance/self-service usage flows, these business vocabularies can also be used in other areas such as standardizing the naming of virtual objects in Watson Query and providing the rich term-to-term relationships to fuel Watson Knowledge Semantic Search

For more information on IBM Knowledge Accelerators and IBM Watson Knowledge Catalog

--

--

Pat O'Sullivan
IBM Data Science in Practice

Senior Technical Staff Member with IBM. A Data Architect with a background in Data Models, Business Glossaries, Data Governance and Data Management.