What is a modern data dictionary? How is it different from the good ol’ database dictionary?

Ayswarrya
Humans of Data
Published in
5 min readFeb 26, 2020
Photo credits: Unsplash

Raise your hand if this has happened to you too. ✋

The time when the numbers in your data set didn’t make any sense

Or this one.

The time when you didn’t know what a column name stood for

Often, the humans of data (aka folks like you) spend an insane amount of time figuring out what data means and whether or not it’s credible.

80% of a data scientist’s valuable time is spent simply finding, cleaning, and organizing data, leaving only 20% to actually perform analysis. (Harvard Business Review)

Also from Harvard Business Review:

Studies show that knowledge workers waste up to 50% of time hunting for data, identifying and correcting errors, and seeking confirmatory sources for data they do not trust.

Here’s what some humans of data said on Reddit when discussing nightmarish situations at work:

What if you could create a central repository for all your data — that lets you verify your data’s credibility (for example, queries such as minimum, maximum and frequency)?

Even better — what if this repository already ran the standard data check queries for you?

No more dependencies on IT to give you a data set health report! Think of all the time and energy you would be saving!

Well, your prayers have been answered — enter the modern data dictionary.

I know, sounds too good to be true. It’s not, though. Let’s see how a modern data dictionary can help you resolve your trust issues with data.

But hey, first things first… let’s understand the basics of a data dictionary.

What is a modern data dictionary?

The short of it: it’s not just a database dictionary. Nor is it a business glossary.

And now, for the long of it, buckle up!

Here’s a question for you: what do you do when you come across a word you don’t know? You look it up in a dictionary.

A modern data dictionary is just like that. It’s the go-to tool for the humans of data (i.e. you) to understand everything about their data sets and verify data credibility at a glance.

A good example of a data dictionary would be Atlan’s auto-generated data dictionary, which provides you with information such as variable name, description, type and frequency, among others.

Keep in mind that the modern data dictionary goes beyond traditional database dictionaries (also known as metadata repositories) that just store all the metadata.

After all, data isn’t just rows x columns. It has a lot of attributes that make up its complete profile. And a modern data dictionary plays a big part in building that profile.

Psst… wondering what else plays a role in building data profiles? The answer: data catalogs. Read this article on catalogs below.

Ummm… how is this modern data dictionary any different from a business glossary? Or is it data glossary? 🤔

Hold your horses! We can see how all of this might get confusing.

Fret not, here’s how a business glossary (also known as a data glossary) is different from a data dictionary.

Traditionally, data dictionaries referred to database dictionaries, which covered variable names, types, descriptions, frequencies and other such information on data sets.

Within that environment, a data dictionary wasn’t enough as it only made sense to engineering, operations or IT, not to business.

Enter the business glossary (or enterprise business glossary) — defining business terms used within an organization.

For instance, a variable like date and its specifications would be an example of an entry in the data dictionary.

Whereas a term like Customer (how does this organization define a customer, how it relates with other terms such as Sales Qualified Lead or Marketing Qualified Lead) would be an example of an entry in the business glossary or data glossary (or enterprise data glossary).

What’s the key difference between a data dictionary and a business glossary, other than the former being owned by IT and the later by business?

Well, the same variable might occur in different data sets, with different meanings. A variable like name could stand for the first name in one data set while the last name in another data set.

However, the terms covered in a business glossary have the same definition (and interpretation) across the organization. So, how marketing defines Customers would be the same as how Sales or Support define Customers.

Hope that clears out all the confusion. I know, that was quite the information overload. 🤯

Let’s take a minute and recap. So far we’ve understood:

  1. What is a modern data dictionary
  2. How it’s different from the traditional data dictionary (aka database dictionary aka metadata repository)
  3. What business glossary (or enterprise business glossary) and data glossary really mean

Next up, what would an ideal data dictionary look like and how can you create one for your organization. Curious to know more? Then check out the rest of this article on the Humans of Data blog here.

And that’s all for now folks! Thanks for reading, and leave any questions or suggestions you might have in the comments below.

--

--