Tableau Metadata API: building a custom data source audit with Python

How to generate a comprehensive list of all underlying databases

Elliott Stam
Apr 11, 2020 · 8 min read
Searching for the sources of your data? (Photo by Houcine Ncib on Unsplash)

Suppose you need to track or identify all data feeding into your Tableau Server environment. How would you go about gathering the various databases and flat file types? You could comb through the hundreds or thousands of published data sources and workbooks, but that sounds… awful.

If you have Tableau Server version 2019.3 or higher, you’re in luck. You have free access to the core functionality of the Metadata API. Who doesn’t like free? If you’re a seasoned tech professional, ‘free’ might sound a whole lot like ‘I have to code it myself’. Well, you are absolutely right. But with the right tools, that’s no problem.

Let’s get to it.

This tutorial walks through using the Python tableau-api-lib package and is part of a series on how to tap Tableau Server like a keg, giving you control over Tableau Server’s REST API.

These tutorials assume you have Python 3 installed already. If you do not have Python 3 yet, this will get you started: guide to install Python.

This tutorial makes use of two free resources provided by Tableau: the REST API and the Metadata API.

Tableau has quite a few APIs but don’t let that confuse you. As it happens, The REST API and Metadata API use the same authentication method. The REST API gives you control over the Tableau Server REST API endpoints, while the Metadata API allows you to issue GraphQL queries against the newer Tableau Catalog.

If none of this is clicking yet, just hold tight. We’re about to get some hands-on experience using both APIs. It’s easier than you think.

Even if you’re a pro at these tutorials, do yourself a favor and pull the latest version of the library. It’s updated on the regular.

pip install --upgrade tableau-api-lib

New to this Python stuff? Don’t sweat it, you’ll catch on quick. Follow this getting started tutorial. That tutorial walks you through getting connected to Tableau Server using tableau-api-lib.

Use the code below as a template for getting connected to your server. In later steps, we will build upon this boiler plate with additional lines of code. At the end of the article, you’ll find a consolidated block of code you can copy / paste for your convenience.

Fun fact: you can also use personal access tokens, assuming you are on Tableau Server 2019.4 or newer. If you’re all about the access tokens, check out my article for details on how to use them.

You can actually skip all the API shenanigans if you just need to swoop in, fetch the database varieties, and make your escape. In Tableau Server versions 2019.3 and higher, you can navigate to a section of your server where you can fine-tune your Metadata API queries with almost zero technical skills required.

For example:

Once you authenticate, you land on a page like this:

You can treat this page like a sandbox to get familiar with the information you can pull using the Metadata API. Some advantages of this interface include rapid testing, autocomplete, and the fact that you don’t have to write anything other than the GraphQL query that will be executed.

As you can see in the image above, I wrote a short query to pull the various database servers active on one of my test sites.

Note that if you use this GraphQL interface, you are only querying data for the site you are currently logged into. If you need this information across your entire Tableau Server (multiple sites) then you’ll be better-served by the following steps. By combining the REST API and the Metadata API, we can loop through multiple sites and collect the relevant metadata for each one.

My goal in this tutorial is to discover all of the database connections I have on Tableau Server. To give this exercise a purpose, let’s use a real-life example from my experience: I’m helping a client build a custom data source audit for their environment and part of that audit is understanding where Tableau gets its data.

Here’s the query I’ll run to pull all of the database servers I’m connecting to:

I used the Metdata API reference to plan out this query based on the documentation for the ‘DatabaseServer’ object.

Running the query is a simple function call:

The server sends us an HTTP response containing our query results in JSON format. Let’s see what that looks like:

Here’s my output:

Now I have nothing against JSON, but in the Python world Pandas DataFrames are my preferred method of handling data. By storing our data in a Pandas DataFrame, we can pull additional queries and easily join the results. More on that later.

For now, let’s store the results of this query into a Pandas DataFrame:

Here’s a look at where that lands us:

You’re doing great. Let’s keep going!

I’m 100% certain my test site connects to more than Snowflake. If I’m planning a custom content migration or building a custom data source audit, I want to know ALL of the sources of data, not just databases.

The Metadata API has us covered. Let’s pull another GraphQL query. This time we aren’t pulling database server info (so we won’t get ‘hostName’ values); instead we will pull all of the individual sources of data which Tableau considers ‘databases’, even if their source is a flat file like a CSV or a Microsoft Access database.

Here’s our additional query:

Let’s run that query and store the results in another Pandas DataFrame.

We now have a new DataFrame that looks like this:

Now, as a demonstration of how you can combine multiple GraphQL queries to your advantage, let’s join in the database server information where applicable. That will give us one master table to work with.

In this relatively simple tutorial, joining these tables may be a minor convenience. In complex workflows, the ability to join the results of different queries becomes invaluable.

Let’s use the ‘data_connections_df’ table as the left side of our join, as it has all of the connection varieties we want while the ‘db_server_df’ table has supplemental information (the database server host names).

We’ll join the two DataFrames wherever the ‘name’ and ‘connectionType’ columns match:

Here are my results:

Personally, I’d like to see the unique rows. That is, I want to remove any rows that do not present any new information. Since our data is stored in a Pandas DataFrame, that’s easy to do.

There are enough Pandas preachers out there already, so I’ll spare you the soap box sermon about how great the library is. It’s great, though.

Here’s how you can see the unique connections from the table we have built. Let’s not consider the ‘name’ column, since names aren’t necessary if all we care about is the type of connection our data is sourced from.

Here are my results:

So, it looks like I only have three varieties of data connections on this test site: Hyper extracts, Snowflake databases, and Excel files.

So far in this tutorial, we’ve only executed data against a single site (the one we authenticated into by default). Now let’s explore how you can extend what we’ve learned so that you can pull Metadata API data for your entire server (or a custom list of sites).

Using the REST API, we can switch sites. There’s a tutorial on switching sites that covers that topic in detail, so we’ll simply implement it here without much discussion.

Long sections of code look a bit ugly when pasted into Medium articles, so please see the consolidated code section at the end of the article for an example of how you loop through sites and collect metadata from each one.

The process boils down to:

  1. initialize a Pandas DataFrame to store the data for all sites
  2. initiate a for loop, iterating over all sites on the server (or whichever sites you define)
  3. switch to a site using a REST API call
  4. query the desired information using a GraphQL query against the Metadata API
  5. store the resulting data in a Pandas DataFrame
  6. (optional) add the site name as a column to the the Pandas DataFrame
  7. append the site DataFrame to the DataFrame defined in (1)

Here’s some sample output of a DataFrame I built listing the server connections found on all of my test sites:

Man… if variety is the spice of life, my database lineup is as spicy as a popsicle.

Check out the consolidated code in the GitHub gist at the end of the article for an example of how you can accomplish the looping.

We covered a lot of ground here, but the core principals are relatively simple. Using free tools available to you through Tableau and Python, you can harness a lot of powerful information from your Tableau Server!

I hope this tutorial helps pave the way for you to make some waves in your own Tableau Server environment. If you need help building REST API or Metadata API workflows, reach out.

Use this GitHub gist as a starting point if integrating the process outlined in this tutorial into your own workflows.


Delivering insights into Tableau Server automation and data…

Elliott Stam

Written by

Data scientist and author. Sometimes seen falling down mountains with a snowboard strapped to my feet.



Delivering insights into Tableau Server automation and data visualization.

Elliott Stam

Written by

Data scientist and author. Sometimes seen falling down mountains with a snowboard strapped to my feet.



Delivering insights into Tableau Server automation and data visualization.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store