How We Implemented a Tableau Governance Strategy

Lily Chang
Policygenius
Published in
7 min readJun 23, 2022

Policygenius is America’s leading online insurance marketplace. Our mission is to help people get insurance right by making it easy for them to understand their options, compare quotes, and buy a policy, all in one place.

Background

In the last four years, the Policygenius data team and stack has evolved a lot with the growing business (Read more about the existing architecture here.) Looking back in the early years, we implemented a few tools in a short time frame without thinking holistically about governance, and as a result, we started to feel pain as the volume of data assets grew tremendously (and out of hand). However, it is never too late to revisit the governance strategy and such is our experience with Tableau.

The intention of this post is to share a three-month long journey and learnings of how our Tableau evolved from a fully decentralized no-governance state to a federated model with a proper governance strategy in place. Despite the focus on Tableau, the learnings and general principles should apply to any Business Intelligence and data tools. To any data admins who are in the same boat like we were a year ago, I hope this post can provide you with some confidence and insights if you plan to overhaul the governance strategy at your company.

Before the Overhaul

Tableau has been our main Business Intelligence tool since late 2017. We deployed the self-managed server version on a single-node Windows Virtual Machine on Google Cloud Platform. Our user base grew from 15 in April 2017 to ~100 in June 2021, at which point there were 1300+ workbooks and 160 data sources, consuming roughly 700GB of disk space. The increased user base and jobs have added a lot of load to the VM over the last few years. As a result, our Tableau Server was the most critical and prone-to-failure component in our data architecture. Below were the problems we encountered:

  • A server bloated with stale content. Due to a lack of review and audit process, content kept getting piled on and consequently the Data Engineering team had to resize the virtual machine multiple times to keep up with the increasing disk usage. At the same time. Tableau views were often slow to render as I/O operation was impacted by the low disk space. We started seeing more frequent outages, almost once a week in June 2021.
  • Lack of quality and access control. Publishers can create content without any quality control in place and can give access as much as they want. Some analytics users referenced outdated or unverified reports, which caused confusion and a loss of trust among business teams. This has also caused unnecessary fire fighting among the data team as we had to trace the source of bad data.
  • Confusing experience among data product consumers. Lots of nested projects with inconsistent naming conventions were created, which was confusing to analytics users as they didn’t know where to look for the source of truth.

Overhaul

We had a few options, outlined below, and decided to go with option one after taking into account the scope and the extent of disruption on the user’s existing workflow.

Option 1: Overhaul the current server by cleaning up and enforcing governance in place.

Benefits:

  • Minimal disruption on the user’s workflow and no paperwork needed for revising the contract.

Risks:

  • Change management required to devise governance control.

Option 2: Migrate to the managed version, i.e. Tableau Online

Benefits:

  • The experience should remain mostly the same for the Tableau users. The current workbooks/data sources can be moved as-is to the hosted version.
  • Future maintenance can be greatly reduced.

Risks:

  • Tableau Online has a hard limit of space usage of 100GB, while our content usage was about four times that limit. We need to figure out a way to drastically reduce the total content size, which requires a lot of refactor and coordination with business users.
  • We need to migrate the content and users from the self-hosted server to the hosted version.
  • We will need to go through Finance to rework our annual contract.

Option 3: Migrate to another BI solution

Benefits:

  • There are some hosted BI solutions that require even less maintenance as compared to Tableau Online, so it can potentially reduce the maintenance in the long run.

Risks:

  • This is the most disruptive solution as we will need to rework all the workbooks and data sources to the format compatible with the new tool and train users.
  • We will need to go through Finance and Legal to create a new contract.

Once committed to option one, we started the following two work streams in parallel — a one-time declutter and the implementation of the new governance strategy.

A Marie-Kondo-style declutter

Tableau logs every action a user performs on the server in the Server Repository, which enables us to figure out the stale resources to delete. We identified more than 300GB of content for deletion including 900+ stale workbooks with no viewership for 90 days and 100+ data sources without any connections from the active workbooks. We then wrote a quick script to delete en masse via the REST API. It took us ~2 sprints from notifying content owners and waiting for confirmation to batch deletions. The whole process went pretty smooth and users are really happy that they have a cleaner slate to look at.

A new governance strategy

We proposed a new governance strategy and socialized it with all the department heads and leadership to get buy-in. There are three key components:

Federated governance model

Inspired by this post about how to choose a governance model, we adopted a federated steward model composed of data engineers and power users from business teams to oversee the day-to-day operations and discuss policies and processes:

  • The Data Engineering team: Responsible for server side maintenance, such as monitoring jobs, setting up refresh schedules, permissioning, and project structuring.
  • Business stewards: Each department nominates a primary and a secondary steward. Some stewards can be promoted to Project Leader, which has escalated permission for a given project. Since this permission is not customizable and can include more access scope than we would like, we put in writing what a Project Leader can and can not do and have the steward committee hold each other accountable.
  • Rituals: Stewards meet monthly to discuss concerns and proposals for policy/process change. This is a very productive meeting so far, and we have made multiple process changes as a result of the discussions.

A new content management strategy

We created a new content management model to clearly define the relationships between entities on the server and incorporated permission management through this model.

Users: A user is assigned to one or multiple groups.

Groups: Groups instead of the users determine the access scope to a given project following the Don’t Repeat Yourself (DIY) principle. Each department has three groups:

  • {Business Team} — Creator: Can publish workbooks in {Business Team} root project. “None” permission in {Business teams other than the designated team} project unless added to the specific business team group
  • {Business Team} — Explorer: Can view and interact with workbooks in {Business Team} projects. “None” permission in {Business teams other than the designated team} project unless added to the specific business team group
  • {Business Team} — Viewer: Can view workbooks in {Business Team} projects. “None” permission in {Business teams other than the designated team} project unless added to the specific business team group.

Projects: A project represents a grouping of assets.

  • Top-level project: Grouped by departments as they remain static over time.
  • Second level project: Each top-level project has a staging sub-project for development purposes.
  • This applies to all projects except Operation where we segregate Life and P&C due to the volume of the content.
Modeling Users <> Groups <> Projects

A regular content audit process

Using both automated and semi-automated processes, we created a regular cleanup routine:

  • Remove stale content: To continue keeping our server clean and fresh with the most up-to-date content, we have a regular process to audit the usage of stale resources based on the Server Repository mentioned above and delete if needed.
  • A windows schedule job for automatically cleaning up unwanted server files: This was added before the overhaul. Tableau clears out some but not all temporary files, so we need to run the maintenance command ourselves. Below illustrates the trend of free disk space pre-post the automated cleanup.
How the free disk space changed after a one-time cleanup vs. daily automated cleanup

Conclusions

Since the overhaul, we have received a lot of positive feedback from Tableau users and stewards. We also saw a significant reduction in the occurrences of outages — Only one incident occurred due to a transient network issue since the overhaul. Looking ahead, we are expanding the governance strategy to include upstream systems and continue to compare Tableau with other solutions on the market to enhance BI capabilities.

--

--

Lily Chang
Policygenius

Currently Data Engineering Manager at Justworks. Love tinkering with data/software/platform and building teams.