Demystifying Data Governance: Understanding the Differences Between Business Glossary, Data Dictionary, Taxonomy, and Ontology

Ravi Dawar
9 min readMar 21, 2023

--

Data is useful. High-quality, well-understood, auditable data is priceless — Ted Friedman, Gartner.

Photo by Thought Catalog on Unsplash

The importance of Data Governance

Data governance and management refer to the process of managing data assets within an organization to ensure their accuracy, consistency, security, and compliance with legal and regulatory requirements. It involves the development of policies, procedures, and standards for managing data, as well as allocating roles and responsibilities for overseeing data management practices.

Effective data governance and management are crucial for several reasons. Firstly, accurate and reliable data is essential for informed decision-making. Organizations must thoroughly understand their data assets to make strategic decisions, optimize operations, and gain a competitive advantage.

Secondly, data governance and management help organizations comply with legal and regulatory requirements. With increasing regulations (such as GDPR and CCPA) and laws governing data usage, organizations need to ensure that their data management practices comply with these regulations.

Thirdly, effective data governance and management promote data security and minimize the risk of data breaches. With cyber threats on the rise (a breach can cost the affected business $4.1 million on average globally. A data breach in the US costs $9.44 million on average — the highest of any country in the world), organizations must secure their data assets to prevent unauthorized access, data theft, and other security breaches.

Where to begin?

Now that we understand the significance of data governance and management, we can discuss where to begin. It is crucial to note that “good” data governance requires investing time and effort in anticipating future requirements, continually re-evaluating goals, and having the necessary people, processes, and tools to address issues as they arise.

However, it is possible to begin by ensuring that data definitions are adequately documented across the organization. This initial step provides a solid foundation for efficient data management and enables organizations to develop a consistent understanding of their data assets, thus leading to more informed decision-making.

As someone who engages with data teams across various industries daily, it’s striking how concepts like business glossaries, data dictionaries, taxonomies, and ontologies are used interchangeably. What’s even more perplexing is that many organizations don’t give this crucial process the attention it deserves. The lack of clarity on their definitions can lead to confusion and miscommunication in organizations. Therefore, this article aims to demystify the differences between these terminologies and provide clarity to those who use them.

By understanding the distinctions between these concepts, data teams can effectively manage data assets, develop data-driven insights, and support the decision-making process. So let’s dive in and explore the nuances between these critical data governance and management terms.

Data Dictionary

A Data Dictionary is an important document that contains structured data elements and their technical metadata extracted from a given Data Model. It can be seen as a narrative version of a data model, providing semantic names, definitions, and other structural characteristics such as the relationships between different elements (primary/foreign keys), allowable value constraints, and the data type and length of each field. It is a comprehensive guide to understanding the meaning, structure, and relationships between data elements in a database or dataset.

Data analysts, database administrators, and developers rely on a data dictionary to ensure data is used consistently and accurately across an organization.

While a Business Glossary provides a “business relevant” or “human readable” articulation of data, a data dictionary offers a “technology relevant” or “machine-readable” articulation of data.

For example, a customer dataset's data dictionary would include details such as the data type and length of each field (e.g., customer name, address, phone number), description of each field as well as any constraints or rules (e.g., the customer ID field is a primary key and must be unique).

Data Dictionary — Powered by Atlan

👉 In essence, a data dictionary is an essential tool for anyone who needs to understand, manage, or work with data in any capacity.

Business Glossary

A business glossary is a valuable tool that defines terms and concepts specific to an industry or business domain. Its purpose is to provide a common language that enables effective communication among employees across different departments. For example, in the healthcare industry, a business glossary may define terms such as “Electronic Health Record” or “Health Information Exchange.”

One of the major issues in organizations is the use of the same name for different meanings, causing misunderstandings. Having an agreed-upon definition for a term helps reduce these problems. As a data practitioner myself, I have participated in numerous board meetings where I observed various teams reporting different figures for a fundamental metric such as “Gross Revenue.” This disparity arose because we lacked a comprehensive business glossary that could establish a clear definition of gross revenue and standardize its measurement.

Unlike a data dictionary, a business glossary is intentionally structure-agnostic and includes terms that are deemed important by the business. It is designed to be free of any content organization, making it quicker to produce and more widely accessible than most other Data Architecture deliverables.

For instance, a business glossary for a retail organization may include terms such as “sales funnel” or “customer acquisition cost.” The former describes the journey that a customer takes from initial awareness of a product or service to eventual purchase, while the latter formulates all the costs of winning a customer to purchase a product or service.

👉 In essence, a business glossary functions as the “FAQ” of a company, ensuring consistency in the definition and measurement of strategic metrics across the organization.

Business Glossary Experience — Powerd by Atlan

An effective business glossary should not only be constrained within the data catalog but also seamlessly integrate with external applications such as OLTP systems (e.g., Salesforce), OLAP systems (e.g., Snowflake, Databricks), business intelligence tools (e.g., Looker, PowerBI, Tableau), and collaboration applications (e.g., Slack, Teams). This feature is critical in reducing communication overhead in your organization by explicitly defining commonly used business terms across various departments, clarifying unknown data attributes (such as owners, certification, and classifications), and providing transparency into measurement formulas.

Embedded Context on a Tableau Dashboard — Powered by Atlan

Taxonomy

A taxonomy is a hierarchical framework for organizing and classifying data or information. It groups related items or concepts together based on their characteristics and relationships, allowing for easier retrieval and analysis of data.

Biologists use taxonomy to organize and classify living organisms based on their characteristics and evolutionary relationships. In the study of human evolution, taxonomy is used to group similar species together by genus.

For example, the genus Homo includes several species that are closely related to modern humans, such as Homo erectus, Homo neanderthalensis, and Homo sapiens. Within the genus Homo, each species has its own unique set of characteristics and evolutionary history.

Sapiens: A Graphic History: The Birth of Humankind (Vol. 1)

The taxonomy of human evolution also includes other genera, such as Australopithecus and Paranthropus, which are not direct ancestors of modern humans but share common ancestors with our evolutionary lineage. These genera include several species of hominins that lived millions of years ago and represent important stages in the evolution of bipedalism, tool use, and brain development.

By using a taxonomy to organize and classify these different species and genera, scientists can better understand their evolutionary relationships and the characteristics that define each group. This information can help us to piece together the complex history of human evolution and shed light on the origins of our species.

A Data Taxonomy is a defined classification of terms, organized hierarchically into any number of levels of category and sub-category as required, and to serve a given purpose.

For example, a taxonomy for an online retailer might include categories such as “apparel,” “electronics,” and “home goods.” Within each category, there may be subcategories, such as “shoes,” “jewelry,” “computers,” “televisions,” “furniture,” and so on.

A taxonomy for a healthcare company might include categories such as patient data, clinical data, financial data, and administrative data. Within the patient data category, subcategories could include demographic data, medical history, and treatment plans.

👉 In essence, classifications of things are necessary for us to identify and retrieve specific items and to communicate our needs to others effectively. Without classification, we would have a disjointed list of unique and unrelated names rather than cohesive groups of similar items with shared characteristics.

For instance, imagine if we didn’t use a classification system for living organisms and instead assigned a distinct name to each individual human being, animal, or plant. In this scenario, we wouldn’t have a term like “Homo sapiens” to describe a particular species of humans, but instead, each person would be referred to by a unique and separate name, such as “John Doe,” or “Steve Smith” or “Superman.” As a result, we would have many repeated names, and the process of identifying and categorizing individuals would become unnecessarily complicated and challenging.

Ontology

Before delving into the concept of ontology, I would like to acknowledge that this is a complex topic and my understanding of it is constantly evolving. With that caveat in mind, let’s explore what ontology entails.

In simple terms, ontology is a way of organizing information by creating a framework of concepts, relationships, and rules that define a particular subject. This framework helps different systems to understand and communicate with each other more effectively. It is like a common language that helps different computers or programs to talk to each other. Ontology is also used to define the subjects, predicates, and objects that can exist in a knowledge graph.

Let’s say we want to create an ontology for a music streaming service. We might define the following concepts, attributes, relationships, and rules:

Concept: Song

  • Attributes: Title, Artist, Album, Genre, Length
  • Relationships: Belongs to an Album, Belongs to an Artist, Has a Genre
  • Rules: A Song must have a unique Title within an Album, A Song must belong to an Artist, A Song can only have one Genre.

Concept: Playlist

  • Attributes: Name, Description, Creator
  • Relationships: Contains Songs, Created by a User
  • Rules: A Playlist must have a unique Name, A Playlist must contain at least one Song, A Playlist must be created by a User.

Concept: User

  • Attributes: Name, Email, Subscription Status
  • Relationships: Creates Playlists, Follows Other Users
  • Rules: A User must have a unique Email, A User must have a Subscription Status, A User can create multiple Playlists, A User can follow other Users.
Ontology Example for the Music Streaming Sase Above (image by author)

👉 By defining these concepts, attributes, relationships, and rules, we have created an ontology that can be used by different systems to understand and communicate with each other. For example, a music streaming service might use this ontology to allow users to create and share playlists, recommend similar songs or artists, or provide personalized music suggestions based on a user’s listening habits.

Conclusion

In conclusion, effective data governance and management are crucial for organizations to ensure accurate, reliable, and secure data, comply with legal and regulatory requirements, and minimize the risk of data breaches. Investing time and effort in anticipating future requirements, continually re-evaluating goals, and having the necessary people, processes, and tools to address issues as they arise is essential for “good” data governance.

Understanding the differences between critical data governance and management terms such as a data dictionary and a business glossary can help data teams manage data assets effectively, develop data-driven insights, and support the decision-making process.

To incorporate these concepts into their data catalogs, companies should first establish their business glossary, data dictionary, taxonomy, and ontology. This process entails partnering with various departments and stakeholders to ensure the accuracy and relevance of the definitions and classifications. Companies should then put in place a system to manage and update these definitions and classifications, such as a data catalog or metadata repository. Ultimately, employees should receive training on the use of these concepts and receive continuous support and guidance to ensure uniform and effective implementation across the organization.

--

--

Ravi Dawar

Sales Engineer - Atlan | Curious about Life, Technology, Finance & Data!