Building a Business Vocabulary for Data Governance and beyond

Pat O'Sullivan
IBM Data Science in Practice
5 min readMar 26, 2021
an open dictionary

Photo by Pisit Heng on Unsplash

It is critical for organizations to better exploit their data assets to meet their business challenges. In the past, they had data warehouses with mainly structured data. Now, companies must deal with data lakes and all types of unstructured data and NoSQL databases. At the same time, they face pressure to more efficiently exploit these data assets to support the growing and varied demands from the business. They recognize that data governance and other aspects of data ops are key to their needs for an AI-ready framework. However, they have challenges to integrate this data infrastructure to meet these business issues. Current data structures are not agile enough for present and future demands. Creating new structures and ensuring a consistent approach to managing business knowledge is a critical part of any future landscape.

Fundamental concepts

The Business Core Vocabulary provides the semantic layer to support data governance. This approach solves two critical issues at once. The first is to help all different users, from data scientists to analysts, to reach the required data. The second is to support the expansion of the landscape beyond the legacy data warehouses and this is done by enabling a core business vocabulary that describes all the data including that stored in data lakes. This allows for a more organic growth of data that can grow and change with the business.

Users across the enterprise have different needs and use different terms to express the same or related topics. The needs of the HR team in an organization are not going to be the same as a marketing team or a financial team. Core to a successful data strategy is to accommodate all users and their present needs, while paving a way for future possibilities. The semantic structures within a core business vocabulary approach gives organizations this flexibility. Included in these future possibilities are the growing use of AI applications that many companies will have in the next five years, if they don’t already.

Construction pieces

Given these constraints, how do we build this system? IBM has defined a vocabulary meta-model that defines a set of constructs designed to address specific needs of businesses. Data Models provide an important influence here. The range of relationships, data types, and cardinalities from Data Models can contribute to a rich business language. Some ontological constructs are also brought into this meta-model to support artificial intelligence that can’t be done with simpler data models. Finally, the meta-model also needs to ensure that it has the functionality of a run-time catalog so that it resonates with business users. Putting all of this together means we define how to use Business Terms, Categories and relationships to deliver the required flexibility. We then start to introduce specifications of Business terms, such as how to describe core concepts or their associated properties.

the outline of business core vocabulary and relationships: 1) term types such as concept terms and property terms, 2) possible related term types such as performance analysis, measure, and alignment, 3) possible related category types, such as scope area and core area, 4) data asset management, and reference data such as reference data sets and reference data values

This precise meta-model across the Business Vocabulary provides the consistency to enable clarity for both human users and machines. IBM-defined the Business Vocabulary with a number of sub-areas to support ongoing growth and usage patterns of the vocabulary. Active vocabularies can consist of a number and mixture of different structures. Having a different number of active vocabularies allows for multiple departmental glossaries to cover the differing uses of terms and different names for the same items by different departments within an organization. Companies have many uses and needs for its data depending on how it is viewed and how it is used.

four pieces of the meta-model: the glossary, the functional view, the taxonomy and the ontology

Flat glossary structures enable business metadata to be presented to business users in a simple manner. Sometimes these simple glossary structures can be used to describe a specific functional view (e.g. a view of Terms relating to “Sales” ). Taxonomic structures allow for the representation of many hierarchical third-party datasets such as Standards and Regulations, while ontological structures are a rich representation of business semantics and relationships. These ontological-type structures also provide the underpinnings to the network structure of vocabularies. Oftentimes, for simplicity, these structures are presented as a simple glossary of terms or phrases because that’s all the users need in many cases.

Advantages of this approach

There are many advantages for organizations adopting the Business Core Vocabulary. The vocabulary provides organizations with a very ordered framework to underpin any construction of cross-enterprise business vocabularies. A consistent view of how an organization’s assets are collected and managed across departments enables these organizations to grow their data assets in a coherent way rather than as a disparate collection of catalogs. It provides a single technical layer and a single layer of business knowledge that can be integrated across both the technical landscape and the business operations.

they physical data management components feed into the data discovery where the data engineer interacts with the process. These feed into the data catalog, managed by the data steward. The business vocabulary as a whole is used by the data scientist, but specific departmental glossaries are all that are used by business users.

It enables multiple types of users to perform their jobs well without having to hunt down or transform data and it also builds data trust across an organization. One of the fundamental advantages of this approach is that it also supplies key integration and interoperability for business units that also need to support other parallel initiatives such as digital transformation of their enterprise.

Of course, in tools like Watson Knowledge Catalog, there are various capabilities to help with the ongoing evolution and maintenance of the Core Vocabulary — such as the use of customized Governance workflows to manage the authorization of any required updates (Governance Workflows: The Key to Building Successful Information Architectures)

A final key element of such a Business Core Vocabulary meta-model is that it isn’t designed to only handle current use cases, but also assists in future proofing for use cases coming down the line.

Conclusion

It is important that organizations can leverage their data assets to react to growing competitive, regulatory, and business challenges. IBM has invested in building out a range of pre-defined industry-specific Business Vocabularies. These “Knowledge Accelerators” help our clients leverage all their data, not just that stored in data warehouses. Even for companies not in the traditional industries IBM supports, the meta-model specifications can help those companies adapt these technologies to their own particular needs. To learn more, please go to IBM Knowledge Accelerators, IBM Watson Knowledge Catalog, and the IBM Knowledge Accelerators Metamodel Specification.

--

--

Pat O'Sullivan
IBM Data Science in Practice

Senior Technical Staff Member with IBM. A Data Architect with a background in Data Models, Business Glossaries, Data Governance and Data Management.