Demystifying Data Governance

Data Governance: Harder, Better, Faster, Stronger — Part I

Vincent Rejany
7 min readAug 26, 2019

Harder, Better, Faster, Stronger! Even after 15 years, this song from electro music group Daft Punk keeps on being avant-garde. Not only do the French electronic music duo, formed in 1993, always present themselves as robots, but the tracks from their album Discovery, released in 2001, sound present and could easily rank in the top 10 of the charts. Why an analogy with Data Governance? Because Data Governance has never been so that “Harder” to execute considering the advent of regulations requiring data management excellence and data protection assurance, the explosion of data volumes, applications and security breaches, and moreover the movement to the cloud. Essentially, about data, organizations don’t know what they know, and if they do, they don’t know where to find it. Therefore, for organizations to face this increasing challenge, they need to move Better, Faster, Stronger. But how?

In this first article, we will demystify the concept of Data Governance and the challenges it faces today.

Most companies would agree that today, data is the very lifeblood of their business, that digital transformation is holy grail for not being disrupted. We often hear then “data” is a corporate asset like money, employees, buildings, and machines. However, “Accounting”, “Human Resources” and “Procurement” do not require any definition. They speak by themselves, and so should the poor ugly duckling “Data Governance”. For years, it has been a strategy struggling for recognition and acceptance. Most of the time Data Governance has been treated as a technology project rather than a business transformation imperative. At the end of the day, business users just don’t understand the value of data governance and what it really involves:

“a cultural change”.

Wikipedia gives a very formal definition of Data Governance as

a data management concept concerning the capability that enables an organization to ensure that high data quality exists throughout the complete life cycle of the data. The key focus areas of data governance include availability, usability, consistency, data integrity, and data security and includes establishing processes to ensure effective data management throughout the enterprise such as accountability for the adverse effects of poor data quality and ensuring that the data that an enterprise has can be used by the entire organization.”

This definition makes data governance resonating like an esoteric concept. However, this is probably one first mistake and a reason why it has been generating little adoption from business people. Quoting Albert Einstein, “if you can’t explain it simply you don’t understand it well enough”. So, let’s try to simplify for bringing clarity in the confusion.

People, Process, and Technology?

In many articles and papers around Data Governance you will read about the famous data governance recipe: People, Process, and Technology.

Data Governance Triumvirate

It is true; Data Governance does rely on people, processes, and technologies. However, are these elements not today the three pillars of almost all business activities and operations in organizations, which pretend to be “data driven”? It won’t be a shortcut to see here a nice syllogism or transitive logic:

“If Data Governance encompasses the people, process, and technology that are required to ensure that data is fit for its intended purpose, and, business activities are driven by data, then Data Governance drives business activities.”

This definition allows to put data governance where it should reside, within business and to highlight that is now a critical component as people and money are.

Trust and Data democratization

Therefore, if we follow the line of reasoning, “Data Governance” is about managing data efficiently for driving business activities. But what does “managing data efficiently” mean? When can we say that one data management activity is achieved in a quick and organized way and delivers the expected results? When it allows to increase revenue, reduce costs or minimize risk? Yes, if metrics and targets have been defined? It helps but depends who does define them and how they are measured.

And by challenging this former question, we can already feel the concern here. Before all Data Governance is about infusing “Trust” for creating the conditions for efficiency and generating value. Trust is essential in the definition of Data Governance, and it does not only concern internal stakeholders, but also the individuals from whom personal data is being processed. We will come back to this notion later.

“Data Governance aims at showing that data and data management processes are trustworthy and credible.”

Other key business activities like Accounting or Human Resources do inspire trust, because of the regulations, the policies, processes and certifications they do rely on. They do involve frameworks, laws, binding corporate rules, and methodologies. Moreover, managing a budget or resources is not only owned by one single team, it is shared, and each manager is accountable for his team, expenses or the potential P&L he oversees.

The same transformation should apply to data, and data governance. For several decades now, business users have lived under the misapprehension that their IT department owns the data of their organization. Data Governance can no longer be measured only from an IT perspective and not from a business one. IT owns and is responsible for the infrastructure. The business is responsible of what that data is, how and why it is held. The demand and consumption of data is increasing, everybody wants to access, and analyze data without requiring outside help. Business people ask for transparency and do want “Data Democratization” instead of IT dictatorship or data scientist’s aristocracy or even in most cases complete anarchy. This call for democracy requires changes in the way data governance is being done.

Data Governance in practice

In practice, how data governance is executed apply. Well, there are for sure multiple approaches and strategies, from the most complex to the most pragmatic, combining the definition of an organization, roles, and processes. From a methodology perspective, we could summarize it in four steps:

Data Governance in Practice (PDCA)

In fact, it is like a classic PDCA (Plan, Do, Check, Act) for the control and continuous improvement of processes, that is also often use in data quality management. On a technology perspective, data governance must rely on a variety of products, which support these macro steps. Using excel spreadsheets could work for small projects, but it can’t be sustainable at an enterprise level.

Collect data assets (metadata) into a Data Catalog

A repository of metadata centralizing information about data sources, schemas, tables, columns extended. A data catalog includes technical attributes (name, description, format, length …) and generated knowledge such as data profiling metrics, and privacy information (descriptive measures, frequency and pattern distributions, content identification …)

Describe business assets perspective in Business Glossaries

Business Glossaries help organizations to reach agreement between all stakeholders on their Business Assets (for example, terms) and how they relate to data assets (for example, database tables) and technology assets (for example, ETL mappings), known as technical assets. A business glossary can be used as a single-entry point for all data consumers to better understand and govern their data asset through the definition and the maintenance of business terms. Business terms can be organized through hierarchies and relationships and can be linked to different roles such as data or business owner or data steward. Different types of terms can be defined according to the information that needs to be documented. Therefore, it contains the Language of the Business, independent of technology used to:

· Define authoritative meaning

· Increase and share understanding throughout the enterprise

· Establish responsibility, accountability, and traceability

· Represent business hierarchies

· Document business descriptions, examples, requirements, valid values,

· Find relevant information assets

Centralize Reference Data

Every organization has some common set of data that are used by many different business processes to provide a standard “library” of terms within various applications. This reference data usually comes from outside the organization (though this is not always the case) and changes infrequently. A few good examples of reference data might be, a list of all countries and their ISO country codes, the “official” list of sellable products, organizational hierarchies, store locations by city and state, approved abbreviations for medical terms. Reference data do describe the acceptable values for business terms.

Search Asset Catalog and Browse Lineage

An Asset Catalog embeds the data catalog, business glossaries, reference data and adds all the other metadata, such as BI reports, ETL processes, analytical models, data preparation jobs as well as metadata from third-party vendor platforms. Lineage supports the management and analysis of object and metadata relationships, including dependencies and life cycle. This management and analysis process reveal where data comes from, how it is transformed, and where it is going, and all the steps in between.

Enforce Data Quality & Business Rules and Remediate issues

Data quality and business rules need to be design and applied on data assets. Issues identified by these rules are to send to a remediation process that provides means to identify, review, and correct the problem data before it reaches the downstream systems. Data monitoring results are presented into data governance dashboard to present system by system and dimensions by dimensions how much data quality and governance are improving. They can also be added to the data catalog layer as additional knowledge on data.

In my next article, we will look at how Data Governance can be executed in a more efficient way with little help from artificial intelligence, so that it can inspire trust and become a real strong awareness, the concern of everyone and not only IT people.

--

--