Tableau Data Catalog: How the need came

And why we decided to build it ourselves

Published in

iAdvize Engineering

5 min readFeb 7, 2022

Some things are better homemade (photo by GreenArt on Shutterstock)

This article is the first of a series of five exposing how to take advantage of the Tableau Metadata API to create a Data Catalog. Before we start exploring the technical steps of the project, let’s first explain its origin.

iAdvize x Tableau

iAdvize is a conversational platform that offers solutions to engage agents in a conversation with a website’s visitors.

Tableau crossed iAdvize’s path in 2016 when the need to make data accessible internally at scale emerged. Before that, each request for the analysis required the help of a Data Scientist/Analyst who had to consume the production databases directly. Then, we decided to implement a data warehouse supplying Tableau that was accessible to a handful of data-driven employees.

Since then, the use of this data-visualization tool expanded to all iAdvize departments until the number of Tableau users exceeded 200. Almost 70 certified data sources are available to our community, and most of them are directly connected to the aggregated tables of our data warehouse hosted by Google BigQuery and orchestrated by Airflow. The number of workbooks published on our Tableau Server (2021.3) also shot up, reaching 800 in December ’21.

Looking for the best use of Tableau

At first, the idea of the Data team at iAdvize was to give each employee the power to become an analyst by giving them access to Tableau: at the time, everyone had access to our Tableau Server with an Explorer licence. Employees created their dashboards online by digging into our certified data sources, finding answers to any use case linked to their field of expertise.

But this turned out to be an utopia, as the required analytics skills and knowledge — such as having an analytical mind, understanding the aggregation granularities or even mastering the internal data — was not something we could expect from everyone at iAdvize, leading to misinterpretation and wrong conclusions.

To overcome this issue, we re-thought our site role repartition among the users: today, 35% of them are Creators, the rest are Viewers. We also allocated Analyst positions among the different teams, to spread the analytical basics and upgraded the analytics documentation for end-users. The goal of this new strategy was pretty straightforward: increase the autonomy of both Viewers and Creators while reducing analytical mistakes.

Concretely, we decided to:

improve the structuration of our Tableau Server to prevent users from feeling lost in this ocean of content,
tidy our certified data sources by sorting the fields in dedicated folders and adding a description to each one of them,
rename our data sources so they are more comprehensible (feature available since version 2021.3);
put multiple communication points in place between the Data team and the Tableau users. As an example, a chat room (Tableau User Group) gathers all employees so they can get answers to their questions from the community. Also, once a week, the Data team appoints a dedicated afternoon to allow some users to get 30 minutes of help to solve technical challenges, master the tool, and improve their dashboards (Tableau Doctor Sessions);
create a centralized documentation hosted in a common library containing data sources’ KPIs definitions, certified workbooks content explanation, and common traps users could miss while manipulating our data.

Why wasn’t it enough?

These answers to our lack of documentation were not easy enough for the user. First, the centralized documentation wasn’t interactive and was hosted in a different tool (our internal Knowledge Library hosted on Facebook Workplace), which made adoption quite complex. Then, Tableau users remained highly dependent on the Data team for the manual maintenance of this resource. Lastly, there was still a lack of access to metadata and data modelization.

That’s how we concluded that we needed to switch to an exhaustive, easy-to-use Data Catalog using the best enablement platform possible: the one that is inside the data-visualization tool, Tableau’s metadata.

What we expected from this Data Catalog

Our first expectation from this tool was to help our users become more autonomous while looking for the right data source or field when performing an analysis.
We also hoped this tool would increase the quality of the internal analyses, thanks to a better understanding of the data.
The Data Catalog should also better inform the users about the existing content, preventing them from spending time working on dashboards that already exist.
Lastly, a process could drastically reduce the time spent by the Data team on manual documentation.

Buying an external solution VS building our own tool

There were two ways to get the Data Catalog we were dreaming of, either buying an external tool or building it ourselves.

The first option had three main disadvantages. Its cost, the low maturity of the market, and the security problems raised by giving the provider access to our data. Indeed, in our context, any SaaS solution wouldn’t fit our needs as our Tableau Server is located on a private network: therefore, this kind of solution couldn’t consume metadata stored there in order to provide the catalog service we would need. We would have had to find a workaround, but the solutions available on the market didn’t seem to be worth the effort.

Concerning the second option, it required three main components:

a source from which we could gather this metadata,
the ability to extract and store it,
and an interactive tool to display information to our users.

Tableau offers the first component through a Metadata API that gives access to Tableau metadata, its content, and its lineage. We had in the Data team the required engineering skills and resources to extract, transform and load this data into a dedicated database. Finally, Tableau itself offers the possibility to create a workbook as an interface between the metadata and the user, as well as a homemade search engine.

Let me end the suspense: we chose the second option.

In the next articles of this series, we will share with you the recipe of our homemade data catalog.