What is a Data Catalog and Why Should You Even Care?

Kushal Saini Kakkar
Humans of Data
Published in
2 min readDec 11, 2019

Here’s why data catalogs could be just the thing you need to meet the challenges of data and metadata management and collaboration.

This is only an excerpt from the article; to read the complete article please go to the Humans of Data blog.

Two data scientists walk into a library at the end of a long day…

Data scientist #1 to the librarian: “Can I get a copy of this book on statistical methods?” Goes on to share the name of the obscure book.

Data scientist #2 to Data scientist #1: “They’ll never be able to find that book.”

The librarian clacks away on the keyboard for a couple of seconds before replying:

“Found it! Here are the details of its author, publishing house and borrowing history. Oh, and someone left a comment saying they found it super useful for understanding logistic regressions. I can grab it for you in a jiffy.” 🤓

Data scientist #1 to Data scientist #2: “Ummmm… why can’t the same thing happen with our data?” 🤔

But, what if it could? Enter data catalogs — the missing link in your data lake. Now get the data you need with the context you need! 💡

First… what is a data catalog?

As seen in the chat above, a data catalog is a library or inventory of all your data assets — a place where all your data is neatly indexed, organized and kept ready for use.

(If Monica from Friends made a data catalog, this would be it — neat to the T!)

According to leading research firm Gartner:

A data catalog creates and maintains an inventory of data assets through the discovery, description and organization of distributed datasets. The data catalog provides context to enable data stewards, data/business analysts, data engineers, data scientists and other line of business (LOB) data consumers to find and understand relevant datasets for the purpose of extracting business value.

But more importantly, Gartner goes on to say:

Modern machine-learning-augmented data catalogs automate various tedious tasks involved in data cataloging, including metadata discovery, ingestion, translation, enrichment and the creation of semantic relationships between metadata. These next-generation data catalogs can therefore propel enterprise metadata management projects by allowing business users to participate in understanding, enriching and using metadata to inform and further their data and analytics initiatives.

Thus, modern data catalogs can help you manage your metadata (aka metadata management) in a way that you can easily curate and access important business context around your data — along with your data itself.

Sounds like a dream? Well, it’s possible! To live the dream, read the complete article on the Humans of Data blog here.

--

--

Kushal Saini Kakkar
Humans of Data

Believes that everyone has a story. MA in lit by pedigree and bibliophile by birth. Loves cheese, coffee and canines.