Data Cataloging With Alation

Matt Weingarten
3 min readApr 22, 2022

--

Drawers for all our table information

Introduction

You maintain various tables as a part of your team’s processing. How do you make sure consumers have the full details into those tables? Sure, you could have extensive documentation in whatever tool your team uses for that purpose, but this can certainly be a bit clunky (plus, you have to remember to keep this up to date every time there’s some change to one of the tables). Tools like Alation can handle this for you in a much cleaner fashion.

What is Alation?

Alation is first and foremost a data catalog tool. It can better represent all of those pages you were maintaining before by connecting to data sources and grabbing all of those schemas onto their own individual pages. One of those pages will look like this:

An example of a table in Alation

You can give descriptions to all the columns, have sample data and sample queries for those who are just using the data for the first time, and even track the data lineage if the data source can pull that off. Pretty clean, I must say.

Working with Alation

The page above was the default page Alation will give you after it adds the table into its catalog. It can definitely be a bit spruced up, and that’s where the concept of the data steward comes in. Data stewards are the SMEs of a dataset and should be maintaining that information.

The first thing that’s lacking about that page is its shortage of descriptions. Alation gives you the ability to download your data dictionaries as a CSV file, with a nested format for all of the individual columns. You can take this file and add the missing pieces (a table description, missing column names and descriptions that will show when they’re highlighted, etc.) and then re-upload it back to Alation so that the page updates accordingly. I recommend keeping those dictionaries in version control, so that all team members can make the appropriate changes and review as needed. You could even go one step further and automate the deployment by uploading the dictionaries into Alation’s API (I have yet to try this because even I have found that to be overkill, and I’m always all for automation).

It also might be worth uploading sample queries and data to these pages as well, just so potential consumers can get an idea of what to expect from your dataset. From there, they can explore in more depth in the actual data source itself.

Other Alation Features

Teams want to know who’s actively using their data, and Alation allows you to get some insight into that. You’ll be able to see the top users on the right hand side, as well as the individual popularity of individual tables and even columns.

You can also add the stewards for that particular table, so that consumers know who to reach out to for any questions. If there’s some particular logic in the table that deserve its own writeup, you can compose an article in Alation and link it to that table so it’s all in one place.

Conclusion

Other tools exist out there for data cataloging, but I’ve found Alation to be relatively straightforward and a massive improvement on how we were trying to maintain everything before. Data stewardship definitely doesn’t get the spotlight it deserves in the midst of every other “data” term that the DE world has, but it definitely carries its weight when promoting a datamart culture.

--

--

Matt Weingarten

Currently a Data Engineer at Samsara. Previously at Disney, Meta, and Nielsen. Bridge player and sports fan. Thoughts are my own.