The Ultimate Guide to Evaluating a Data Catalog

BONUS: A Request for Information (RFI) template to get the best out of Data Catalog demos

Louise de Leyritz
CastorDoc
7 min readJul 28, 2021

--

The Data Catalog Check List

Data catalogs were introduced to help data people find and understand data. Before data catalogs existed, data engineers, data analysts, and data scientists worked blind, deprived of visibility into data sets, their content, their quality, or their usefulness. Consequently, they spent most of their time trying to locate and understand data, often recreating data sets that already existed. This is the kind of issue that data catalogs seek to address.

Data catalogs began with the modest aim of managing data inventory and improving data discovery. Soon enough, they grew in functionality, popularity, and importance. Modern data catalogs have considerably expanded their reach, and are now central to data stewardship and data governance. Data team leaders view data catalogs as strategically important and key drivers of analytic quality and data teams’ productivity.

The thing is, the selection of data cataloging tools has grown exponentially in recent years and there is now a myriad of data cataloging tools to choose from. Which one is right for you? That’s what we help you uncover today.

‍What is a Data Catalog?

Gartner, a specialized research business, defines the notion of data catalog as follows:

“A data catalog creates and maintains an inventory of data assets through the discovery, description, and organization of distributed datasets. The data catalog provides context to enable data stewards, data/business analysts, data engineers, data scientists and other data consumers to find and understand relevant datasets for the purpose of extracting business value”

Gartner, Augmented Data Catalogs 2019

1. Which features do you need?

The first step to choosing a data catalog is to understand your exact need for a data catalog. As we mentioned already, data catalog vendors have multiplied in the past years, and they cater to different needs. Are you looking for a data governance tool? A pure data discovery tool? You need to define exactly what you’re looking for before going on a data catalog quest. To this end, you should start by identifying your pain points, and then find which data catalog addresses them. The first exercise is thus to identify the top challenges that affect your productivity and to map them to data catalog features. To facilitate the task, we’ve done the mapping. Tell us what bothers you, we’ll tell you which data catalog features you’re interested in. In this exercise, it’s important that you get your team to speak. If you’re leading a data team, make sure you understand what bothers team members. They might have different pain points affecting their productivity. You want to make sure you pick a catalog that alleviates their frustrations and allows them to fulfill their mission.

What are the pain points solved with which data catalog features?

Now that you have a clearer idea of the features you’re interested in, rank them in order of preference.‍

2. Is your team going to use it?

You’ve now established with features you need in a data catalog, and you’re ready to scan the market to find your ideal catalog. Wait a second, we’re not done yet. There are other considerations you should take into account. Namely, think about what would make your team use the data catalog. In fact, the whole value of a data catalog resides in its usage. When people use the data catalog, documentation levels increase, the quality of data assets improves, and more people use the data catalog. On the contrary, this can easily turn into a vicious circle where no one uses the catalog. In this case, not only do have poor quality data assets, but you’ve wasted your money in a data catalog. So when you contract with a data catalog vendor, you want to make sure your team actually likes the tool and plans to use it. We thus propose to look at the following four variables when evaluating a data catalog.

What are the features that optimize your data catalog’s adoption?

‍3) Understand the data catalog ecosystem

Once you have clearly defined what you’re looking for in a data catalog, it’s time to find your perfect match. This is no easy task, as there are a plethora of options to choose from. We’ve attempted to untangle the data catalog ecosystem to help you find the perfect fit. We found that data catalogs can be divided in three generations:

  • 1st generation: basic software, similar to Excel, that syncs with your data warehouse.
  • 2nd generation: software designed to help the data steward in maintaining data documentation (metadata), lineage, and treatments.
  • 3rd generation: software designed to deliver business value to end-users automatically hours after the deployment. It then guides users to document in a collaborative painless way.

Here is a brief listing of the pros and cons of each option.

Each generation has its own specificities. Go for the one that fits your data stack.

Data catalog landscape

Below, you will find a data catalog landscape, which can hopefully help you choose a metadata management tool adapted to your needs.

*This is a brief attempt at classifying the tools on the market. If anything seems wrong, or if you don’t see your data catalog and want to have it placed, feel free to reach out.

Data Catalog Landscape

If you want to know more about vendors, their offerings, and the data catalog ecosystem, you will find our data catalog benchmark here.

4) Take demos from selected vendors.

You have now selected a few catalogs that seem to match your pre-defined criteria and answer your business needs. It’s time for the next step: take a demo.

If you sit as a passive viewer during the demo, you’re unlikely to get much value out of it. You should be participating actively and leave with a clear idea of how the data catalog software will help address your specific needs. That’s why we have pulled together a detailed list of what to check before/during a data catalog demo.

We encourage you to plan for the key topics you want to cover and share the features that matter to you the most to the vendors in advance. This will ensure a much more tailored experience.

We thus propose setting the following agenda beforehand covering the following topics:

Cost of ownership

Price is obviously a concern when choosing a data catalog software. However, the price often involves more than the price declared by the vendor. The total cost of ownership involves how much the software costs to purchase, implement and maintain.

Purchasing: Ensure you have understood what’s comprised in every pricing tier. Enquire about potential additional purchases charges, such as extra users.

Implementation: Enquire about implementation costs, as they can make a significant difference. For example, choosing an open-source data cataloging solution will save you from purchasing costs, but will lead to important implementation costs.

Maintenance: Make sure you understand clearly what the vendor charges post purchases, such as updates. Even without updates, the software might be expensive to maintain. For example, legacy data catalogs (1st generation) often require a full-time engineering team to maintain the tool. Ensure that you factor these additional costs within the total cost of ownership.

Vendor support

What relationship will you have with the vendor after completing the purchase? Will you be on your own? If so, does that work for you? This is not a negligible question. A lot of Tesla owners love their cars but have encountered such frustration due to a bad customer service experience that they bitterly regret their purchase choice.

For this reason, ensure you have understood the following:

  1. Training conditions: How is your team going to learn how to use the catalog? Is training included for all users? If not, does it entail additional costs? Make sure you have cleared out the path regarding onboarding matters.
  2. Support: Ensure that you’ve understood different levels of customer service (phone, email) and their costs. Be sure to leave with a sense of the service logistics, such as whether customer service is available 27/7 or only during certain hours.

Data and privacy

Companies can lose a serious amount of money and customer trust following data security breaches. Be sure to understand exactly what data the vendor has access to, the kind of security the vendor uses for its databases, and what processes he’s got in place to keep your information safe.

We also advise you to attend the demo with stakeholders from different teams. This will allow you to gather the most comprehensive feedback, and thus choose the right tool that suits all kinds of users. Finally, ensure that the data catalog is compatible with your current data infrastructure as well as with your vision and roadmap for the next 1–5 years.

BONUS: free RFI template for Data Catalogs

Again, you can find a more detailed version of “what to check before/during a data catalog demo” here, would you be interested.

Originally published at https://www.castordoc.com.

--

--

Louise de Leyritz
CastorDoc

Bridging Data and Business Value | Technical Writer | Host of the Data Couch Podcast 🛋️ https://www.linkedin.com/in/louise-de-leyritz-873049b2/