Data Governance : Exploring the paradigm with Watson Knowledge Catalog — Chapter 1

Praveen Devarao
IBM Data Science in Practice
5 min readMar 24, 2021

Chapter 1: Data Catalog and governance of data

a magnifying lens sitting on top of a catalog of color squares.
Photo by Markus Spiske on Unsplash

A set of best practices for data governance is must have for any organization in today’s world. Ensuring the right data access to the right people, ensuring trustworthy data is accessible when needed and ensuring the integrity of data is maintained in accordance to different data standards and policies is a key to success in unleashing the power of data.

In this series, we will learn and explore what a data catalog is, how it plays a role in having an effective data governance framework, and explore the different constructs that are used to implement effective data governance.

We will use Watson Knowledge Catalog to learn and explore these constructs.

In this chapter, we will understand what a data catalog is and how it helps data governance. We will also learn the different constructs built around a data catalog to implement data governance.

Data Catalog

A Data catalog is a one-stop repository to get information about all the different data sources and data constructs defined within an organization. To put it in simple words a data catalog is a store of metadata related to data assets within an organization.

Using this metadata repository, one can easily find assets defined within the organization. The Data Catalog also provides a means for or becomes a pivotal tool around which efficient data governance requirements can be enforced.

A Data catalog provides the needed tool to be able to organize the data assets within an organization well and augment the metadata with further information that helps users of the system to easily search for assets and trust them for relevant usage.

These characteristics of the catalog makes it a viable collaboration tool around which multiple constructs can be defined to unleash the power of data along with enforcing the data governance requirements with confidence.

The diagram below[Watson Knowledge Catalog — All capabilities in one Experience] is a depiction of Watson Knowledge Catalog’s capability, show casing how data catalog is central to different capabilities for managing and unleashing benefits of trust worthy data.

the portions of the knowledge catalog: 1) data governance consisting of business glossary, policy management, reference data management, data lineage, classification, and workflow. 2) data quality, consisting of data discovery, business term suggestions, data profiling and analysis, and data quality issue detection. 3) data consumption, consisting of policy enforcement, data prep, self-service, and social collaboration
Watson Knowledge Catalog — All capabilities in one Experience

In the diagram you can find the different Data Governance tools (explained in next section) provided by the catalog along with Data Quality management constructs. The diagram also shows the different consumption capabilities provided by Watson Knowledge Catalog.

Data Governance

a person writing on a piece of paper with a pen
Photo by Scott Graham on Unsplash

With GDPR, CCPA, and various other data policies coming in to effect it becomes essential that data be used in accordance to the binding policies. The first step in achieving compliance will be to organize the data in a manner so that it is easy to adhere to these rules.

When we talk about organizing better, a tool like a data catalog becomes the pivot around which we can define the governance constructs and also enforce them.

In this section, we will acquaint ourselves with each of the data governance constructs. In the coming chapters, we will explore the usage of these governance constructs by defining them in Watson Knowledge Catalog and see them working .

Referring to the diagram Watson Knowledge Catalog — All capabilities in one Experience above, the different governance constructs and a one-liner about them is as below

  • Business Glossary : This tool provides the capability of having a dictionary of business terms. Having such a dictionary allows for common definitions across the board in the organization. These terms can be organized amongst themselves in a hierarchical manner as well in order to have a better definition and understanding of the terms.
  • Classifications : These are special type of constructs that can be defined to categorize assets that have common special meaning or are of a special type. For example, using a classification like SPI (Sensitive Personal Information) or PII (Personally Identifying Information) on an asset indicates that it contains information that are sensitive and can be used to identify a person. Using this metadata we can build other constructs around it to enforce the data governance policies.
  • Data Classes : A construct to technically define how an asset should be auto-classified within the system based on the data it contains.
  • Data Lineage : This tool helps the user to understand how an asset within the catalog has moved through in its lifecycle. This captures information about where the asset originated from, what operations have been performed on the asset, where was this asset consumed, etc.
  • Policy Management : A tool to define the data policy in technical language that helps enforcement on the platform. In Watson Knowledge Catalog, policies encapsulate a set of rules that determine how the data asset can be accessed on the platform. Data Protection Rule in Watson Knowledge Catalog , post by Arnajdas , provides an overview of defining and working with Data Protection Rules in Watson Knowledge Catalog.
  • Reference Data Management : A single global repository of reference values which users can make use of to achieve standardization of reference value usage within the organization. Reference Data Management in Watson Knowledge Catalog series provides an overview of working with Reference Data Management capability of Watson Knowledge Catalog.
  • Workflow : This tool provides capability to define a framework for implementation of the above constructs, and moving it through necessary review and approval processes before making data available for usage on the platform.

Conclusion

In this post, we learnt what a data catalog is and how it plays a vital role in achieving data governance. Further, we learnt about the different tools available in Watson Knowledge Catalog to implement data governance. In the coming chapters, we will dwell further into defining Glossary terms, Classifications and Data Classes.

--

--

Praveen Devarao
IBM Data Science in Practice

CMTS @ Oracle Cloud, previously Software Architect @ IBM India Software Labs