Tableau Data Catalog: Harvesting Tableau fields

Various field types grow in our Tableau content

  • the Data team, who wants to track the impact of the deletion (or edition) of a field for the calculated fields using it;
  • the Creators, to explore how a given calculated field is built.
Our Data Catalog harvesting Tableau fields, seen from the sky (Photo by Raland on Shutterstock)

Tableau fields are diverse

Understanding Tableau fields
  • a non-calculated data source field wraps into a PublishedDatasource. It directly connects to a column in an UpstreamTable linked to the PublishedDatasource, namely a UpstreamColumn. Example from the Sample-Superstore data source provided by Tableau Desktop: the « Customer Name » is a non-calculated data source field
  • a calculated data source field is computed with one or many non-calculated data source field(s) and/or other calculated data source field(s). It is useful for the data source’s creator, as it provides the users with a common calculated field directly hosted in the PublishedDatasource. The non-calculated fields used by calculated fields are named « upstream data source fields ». Example from the Sample — Superstore data source: « Profit Ratio » is a calculated data source field, computed with « Profit » and « Sales » (the upstream data source fields)
  • on the contrary, a workbook field doesn’t exist in a PublishedDatasource as it is created by a Creator user in a specific workbook. Thus, it is always a calculated field. Example from the Sample — Superstore data source: « Ship Month » is a calculated field we created in a workbook by extracting the month name from the « Ship Date » non-calculated data source field.

Our hack

  • the “fields” field from the PublishedDatasource object allows us to collect information from all data source fields (calculated and non-calculated). We can distinguish between both types thanks to the “upstreamFields” field (if empty, it surely is a non-calculated field). Note that the “datasourceField” object from the API doesn’t have the same definition as our data source field (more info here);
  • calculatedField provides details about all CalculatedFields (from a data source or a workbook). We didn’t find a way to separate the two kinds of CalculatedFields.

The diagram

Playing with Tableau Metadata API

Let the show begin!

  • a member of the Data team wants to check if the fields’ definitions are consistent among the data sources
  • a Creator user would like to know the formula of a data source calculated field without having to connect to the data source
  • a member of the Data team would like to remove the field « Client Country » from a data source and know which calculated fields would be impacted
  • a Creator user wants to know in which published data sources they can find a field called « Number of Conversations »

What’s next?

--

--

Publications from the iAdvize engineering team :)

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store