Tableau Data Catalog: Diving into workbooks’ complexity

The icing on the cake

Caroline BURIDANT
iAdvize Engineering

--

So far, we are focusing on querying the Tableau Metadata API to get data from the upstream table to the calculated field by way of the data source field. This 4th article will end the deep dive on the API by addressing the workbook topic: can we get metadata to study the use of a specific field in a given workbook, or more precisely in a dashboard, and maybe even… in a sheet? (spoiler: yes)

As usual, let’s start with a few definitions to make sure you understand the queries we use to get this metadata.

A workbook is a nesting doll

Allegory of a workbook hosting a dashboard hosting a sheet (Photo by Sudowoodo on Shutterstock)

Please note again that the content related to an object in the Tableau Metadata API model will take an uppercase letter in this article.

A Tableau Workbook is a .twbx file that can be published on Tableau Server. It can be qualified with Tags and is owned by one user. A Workbook can have both Sheets and Embedded Data Sources: at iAdvize, we rarely use the latter and ignore it in our Data Catalog.

There are three types of Sheets:

  • A worksheet contains a single view. It’s the worksheet that’s related to a Published Data Source, depending on the fields used in it. Warning: this object is referred to as « Sheet » in the Metadata API,
  • A Dashboard compiles zero, one, or several worksheets.
  • A Story dynamically displays worksheets and/or Dashboards. At iAdvize, we never use this functionality, so we don’t include the Story object in our Data Catalog.

Thus, Tableau Workbooks are essentially nesting dolls: the Workbook contains the Dashboard which contains the worksheet. But, the tool allows these relationships to be more complex. A Workbook contains only worksheets or only dashboards, and a Dashboard can exist without being linked to a worksheet (for example, a text Dashboard).

In addition, keep in mind that a workbook Calculated Field will inevitably link to a Workbook in which it’s been created. An optional relationship with a worksheet can exist if the workbook Calculated Field is used in it. To know more about the workbook Calculated Fields, please refer to the third article of this series.

The diagram

Only one table was mandatory to link these objects: link_sheet_dashboard allows us to know which sheet is used with each Dashboard, as a given worksheet can be used by multiple Dashboards in the same Workbook.

However, as there can only be one Workbook related to a worksheet or a Dashboard, we respectively host this information in the dim_sheet and dim_dashboard.

Now be ready, and watch how this completes the global diagram:

Playing with Tableau Metadata API

The queries used to gather metadata about the three objects (Workbook, Dashboard and worksheet) are quite simple:

As the query to get the link between a worksheet and a Dashboard:

In the first article of this series, you can find the queries that will help you get the link between a Workbook and a User or a Tag.

This query allows you to get the relationship between a worksheet and a Published Data Source (indeed, when using the blending functionality, a worksheet can be linked to multiple Published Data Sources):

Our hack

The main difficulty here was to link a Field to a worksheet using it. We managed to get the relationship with this query, using the sheetFieldInstances field from the Sheets object:

But for an unknown reason, the Field ids given by sheetFieldInstances don’t match the ones from dim_datasource_field. Even weirder, they match the Field ids from dim_calculated_fields. Anyway, we had to find a trick that links dim_datasource_field with this new table called link_field_worksheet.

What saved the day is that inside a Published Data Source, there can’t be two Fields with the same name, and therefore the Field name and Published Data Source name pair are considered a primary key for dim_datasource_field. So for lack of Field id, we have to use this option. The query above is enriched with the Published Data Source id information:

Let the show begin!

With this last piece of the Data Catalog project, we can now provide answer to the following use cases:

  • a member of the Data team would like to check the Published Data Sources usage by counting how many Workbooks connect to them
  • a member of the Data team would like to know how many Workbooks contain a view called « User Guide » that helps the user with this content
  • a Tableau user would like to find a Dashboard using the Field « Client Name ».

To show you how this last use case looks like, here is a homemade search engine created with Tableau Desktop:

This is the last article dedicated to the metadata collection for our Data Catalog. In the next (and final) article, we will expose how we aggregate this data to create two Tableau data sources, how we use it to build the final tools available for our users, and the future improvements we identified.

--

--