The Growing Importance of Data Catalog Software
Once upon a time, data was used only by data scientists sitting in ivory towers.
Data science was viewed as a cross between rocket science and alchemy: no ordinary person could understand it, but those in the know could use it to turn immaterial numbers on a computer into gold in the company’s bank account.
The world has changed.
Today, businesses are data-first, data-driven and data-led.
Even business users are expected to have a little of the data scientist inside them. And there’s lots, and lots, and LOTS of data.
Some things, however, haven’t changed an iota.
We still need the data we use to conform to two critical materials standards demanded by scientists (and probably alchemists, back in the day), namely:
- The materials are trusted
- The materials are accessible
Let’s take a closer look at why having data meet these two criteria is critical to data-driven business ROI, and the key role of data catalog software in the process.
Wanted: trusted data
In order to use a material successfully in a specialized process with specific intent, the material needs to be what you think it is. If you craft jewelry out of what you thought was stainless steel, but it turns out to be a silver-copper alloy, you will have an unpleasant — albeit harmless — surprise when your skin takes on a greenish hue.
If you plan to do a little science experiment and reach for a container labeled “sodium,” but it actually contains cesium, you will have an unpleasant — and potentially very harmful — surprise when your experiment, instead of looking like this:
Looks more like this (or far worse):
Similarly, if you are basing business decisions on inaccurate, mislabeled, or otherwise dirty data, you’re liable to target the wrong market, offer the wrong product or service, or waste resources on ineffective marketing campaigns.
Needless to say, you don’t want your business decisions to blow up in your face.
Data catalog software = trusted data
A data catalog organizes all the data assets in your company’s data landscape, for each one including definitions, descriptions, ratings, responsible individuals, and more. With a centralized tool for all aspects of data management, data governance becomes infinitely easier. Silos are broken down; contradictions and questions are revealed and resolved.
Having a single source of truth in and of itself would considerably raise the accuracy bar and trust level of your data. But the best data catalog software includes additional aspects that make your data exponentially more trustworthy.
The ability to automatically self-update
Details of your data ecosystem change constantly as your users create, manipulate or alter data assets. If you had to rely on your users to remember to update the catalog every time they made a change, your catalog would be perpetually out of date.
Out-of-date data loses reliability. It might be okay; just like cheese that is a week past its due date might be okay. But do you want to risk your stomach — or your business — on it?
Enterprise data catalog tools that utilize automation not only simplify building a data catalog, but also routinely review all metadata within your BI landscape and update your data catalog accordingly. An automated data catalog solution creates an always-up-to-date inventory of assets that your users can responsibly rely on.
Built-in features that enable you to communicate and collaborate with other users right there in the catalog entries are exceptionally valuable in raising the ability of users to trust the data.
The ability to ask questions to subject matter experts, data owners and data stewards gives you clarity on the nature of the data asset from authoritative sources. “Crowdsourced” information, such as ratings, reviews and usage information, helps you to actively evaluate the data’s quality and relevance for your prospective use.
Integrated data lineage
With an integrated data lineage tool, your users can look into the journey of any data asset in the catalog. Where did it come from? What transformations happened to it along the way? What other assets or reports does it affect?
Transparency leads to trust.
The ability to see the past, present and future of a data asset enables users to feel confident in their understanding of the data asset and its level of reliability, its advantages and its shortcomings.
Now that’s data you can trust!
Wanted: accessible data
It doesn’t matter if the best, purest, most perfect materials exist somewhere out there. If you can’t get your hands on them, they’re useless.
Many a medieval alchemist must have used this to his advantage. When asked by the king who was funding his lifestyle, “So? You’ve been working on this for 5 years! Why haven’t you produced gold already!?”, he could reply, “Your Majesty, I just heard about the key ingredient that I was missing! It’s the silk cocoon of a special moth that is only found in the snowy peaks of the dark mountains. Would Your Majesty send scouts to try and track it down?”
He’d probably be good for another 5 years.
But forget the snowy peaks of the dark mountains. Even if the perfect materials are right in your backyard, workshop or laboratory, but they’re so badly organized that it takes you several days to find what you’re looking for (or to give up), they’re close to inaccessible.
Additionally, the speed at which you can access the right materials is becoming more and more important.
If an alchemist wanted to be the first to turn lead into gold, his projected timeline could stretch decades, making years spent looking for the right materials not a big deal at all.
The space race between the USA and the USSR lasted years. If NASA needed to source the best metal for their rockets, they could afford a few weeks. No big deal.
In contrast, if your business wants to be the first to launch a certain product, or corner a specific market, you may need to make data-based, mission-critical decisions within days. If it takes you days just to locate the right data, you may already be too late. That’s a big deal.
Data catalog software = accessible data, fast!
With a data catalog, you can see what data you have and find what data you need. You can rapidly locate decision-critical data and put it to use, speeding up time to insight and increasing business agility.
The following data catalog features are important for raising its accessibility level:
Powerful, intuitive search
Powerful search and filtration capabilities are essential to making your data catalog usable. The search function is how your self-service BI and business users will access your myriads of catalog assets. The search filters are how they will pinpoint the assets they are interested in.
Make sure your catalog’s search functionality is intuitive. The more your data catalog’s search and filter functions resemble the search and filter experience of the average online retail site, the faster and more efficient your users will be in their data discovery.
Your data catalog should have enough information in each entry (e.g. user ratings, managed annotations, tags, sensitivity of data, and responsibilities) to enable a user to get a good idea of whether or not a given data asset will meet their needs.
It should also have a way to preview data assets, so that the user doesn’t have to go through the process of downloading and reviewing the entire asset only to find that it’s not what they thought it was.
3, 2, 1, blastoff!
The alchemists — and even rocket scientists — of generations past would have their mouths hanging open at the speed at which enterprises move today.
The rate at which businesses need to plan, implement, compete and produce seems to only be getting faster.
But what you produce can only be as good as the materials you have to work with.
If you supply your data scientists, BI teams and business users with a business data catalog that facilitates provision of trusted, accessible data, their efforts are much more likely to bear fruit and lead to business success.
If not, their efforts may go up in smoke.
Although, if you’re a rocket scientist, you might consider that an achievement.