Novel TabPy Approach Connects the Power of Python to the Front-end Ease of Tableau

Published in

GAMMA — Part of BCG X

10 min readApr 7, 2019

If you are familiar with TabPy, then you know it is a powerful solution for those who want to use Tableau as a visualization tool, but require a computational core that is beyond Tableau possibilities and have a team of data scientist available to work on the back end. TabPy is also useful if you have developed a Python application and want to streamline the process of having the GUI shown in Tableau.

These use cases can be accomplished using basic TabPy capabilities. For more sophisticated uses, however, I have developed the novel approach I am about to describe. This approach is designed for a number of use cases, including:

Those who want to have a real-time user interface, one that minimizes the processing time and delay between a parameter change and updated visualization.
Those who want to show (several different) aggregation levels on the same Tableau dashboards, but need to perform all the calculations using the most granular-level data.
Those whose backend calculations rely on more than a single data source or database.

My novel TabPy approach makes it possible for people without deep data-science skills to quickly build and deploy interactive dashboards.

The novel TabPy approach makes it possible for people without deep data-science skills to quickly build and deploy interactive dashboards, capturing the best of Tableau and Python

As a result, non-technical audiences can interact with data in a deeper, more meaningful way and experiment in real time with various scenarios. Based on a real-time Python backend, this approach enables users to capture the best of two worlds by leveraging the wide availability of powerful data-science techniques.

The Origin of TabPy

Tableau provides a powerful and comprehensive platform for data exploration and visualization. The variety of its visualization methods, as well as its built-in geo coding that enables geo-analytics application, have contributed to make Tableau a powerful tool for business intelligence and descriptive analytics. This wealth of capabilities is readily accessible via an extraordinarily easy-to-use interface, making Tableau a very popular tool as well.

What limits Tableau’s use to visual analytics is the fact that it is fundamentally a data visualization platform rather than a programming language. As such, it does not natively offer access to powerful, modern data-science techniques. Such techniques are typically available as open source packages available to a broad audience, written and maintained in commonly used scripting languages such as Python.

To overcome this limitation, Tableau developed TabPy, a Python package that enables it to execute Python code on the fly and then visually display the results. Built on a Tornado process, with communication between the two tools using REST APIs, TabPy enables the quick deployment of advanced-analytics applications. This combination of a backend written in plain Python and a frontend based on Tableau makes it a very powerful way to complement descriptive analytics with predictive and prescriptive analytics.

The Unique, But Limited, Benefits of Basic TabPy

To some extent, TabPy’s split approach provides the best of two worlds: class-leading data visualization capabilities backed by cutting-edge, always up-to-date, data-science algorithms. By effectively splitting accountabilities, this basic approach provides important resources benefits as well: Front-end designers can focus on the visualization portion, while scarce data-scientists resources can focus entirely on the science behind the data analyses. And given Tableau’s accessible learning curve, these front-end designers do not necessarily need software engineering or data-science skills. As such, TabPy puts this combination of powerful data analysis with straightforward data visualization within the reach of the most diverse business organizations.

On its own, Tableau is capable of deriving insight from data. Basic TabPy can dramatically enhance the power of those analyses. Used together, Tableau and TabPy go well beyond the scope of data visualization and into the realm of interactive dashboards, in which final users can tune multiple parameters to evaluate real-time impact.

TabPy can dramatically enhance the power of Tableau analyses, enabling interactive dashboards in which real-time scenarios can be evaluated

With TabPy, whenever a parameter is changed, the dashboards are immediately updated. To do so, TabPy leverages an input/output approach in which both the data aggregated according to the current visualization and the tuning parameters are transferred to Python. Python then processes the data and sends the results back to Tableau to update the current visualization. So far, so good.

Where TabPy Falls Short

In many scenarios, basic TabPy provides an elegant solution. It falls short, however, when the input-output paradigm is broken. This can happen for a number of reasons, such as when the full underlying data set is needed for the calculation, but an aggregate measure is shown in the dashboard. The same problem can occur during more advanced interactions, when multiple aggregation levels are shown at the same time. It can also be difficult to leverage multiple data sources in a single calculation, when the exchange of full databases increases the transfer overhead between Tableau and Python and can harm the responsiveness of the dashboard.

The novel approach overcomes many TabPy limitations, unleashing its full potential to create visually compelling, highly interactive Tableau dashboards

The novel approach that follows can overcome all such limitations, leveraging the power of Python so users can unleash the full TabPy potential to create visually compelling, highly interactive Tableau dashboards.

The Use Case: Complexity Reduction

To explain this novel approach to TabPy, I have designed an example of complexity reduction through product-portfolio optimization. The subject of the optimization is a B2B retailer that has experienced growth mainly through mergers and acquisitions (M&As). Because of this inorganic growth, the retailer faces a great deal of complexity, operating on several markets and having a portfolio composed of a thousand SKUs (Stock Keeping Unit) that are divided into several categories and subcategories. To further complicate matters, the SKUs are built in different plants.

The company’s senior management wants to increase margins by removing the least profitable SKUs, but is willing to continue selling lower performers to maintain a certain market share. They are also willing to keep the manufacturing plants running above a target asset utilization, knowing that to decrease utilization too far would negatively impact the fixed costs base of each plant.

The data available in this example consists of an SKU-level database in which yearly volumes, costs and revenues are reported. The SKUs are organized in Category and Subcategory hierarchical levels:

From a mathematical perspective, the task of product-portfolio optimization management is fairly straightforward. However, the optimization must also consider all the strategic nuances and include the participation of a wide range of stakeholders who have been given access to the information and tools they need to make informed decisions.

These business complications translate into four challenging functional requirements:

Real-time interaction: Portfolio optimization choices must be discussed live with the various stakeholders, during which real-time interactions with data must be possible.
Multiple levels of aggregation: These discussions require impact evaluation at multiple levels, such as from category level to more granular SKU level. Impact on multiple aggregation levels must be visible at a glance.
Various data sources: The impact on a production plant must be evaluated as well, with data visualizations that draw upon access to various data sources such as SKUs level economics and plant utilization.
Data transferred between Tableau and Python: Significant amount of overall data must be accessible for each optimization step. Large volumes of data must be transferrable between the Tableau front end and the Python backend.

A Novel Approach to TabPy

All of the above requirements are solved by the novel TabPy approach. Assuming both Python and TabPy have previously been installed, the implementation of this TabPy variation requires three steps:

Prepare a draft Tableau dashboard
Create the calculation routines backend in Python
Design the Tableau front end

1. Prepare a draft Tableau dashboard

The first step in this approach entails the definition of a draft Tableau dashboard that contains all the building blocks required to define the interaction, the interactive parameters to be tuned, the needed aggregation levels to be shown, and the calculations hooks (callbacks) to Python. This step entails a number of smaller steps:

Clearly state the problem to be solved: In this case, the simple optimization algorithm implemented will remove SKUs according to their gross-margin, evaluated at the SKU level.

Definition of an interactive parameter to be changed

Design the interactive parameters to be changed: Notice that a second convenience parameter has also been defined to name the directory in which the Python package with the optimization routines will be stored. This parameter is very useful in defining the calculation hooks, as described below

Conveniente parameter to identify Python package path

Define the views/levels of aggregation required for the storyline: Here we define two aggregation levels: an SKU-level and subcategory level. The definition of aggregation levels is key as it dictates the Python backend functions’ signatures. One specific function must be defined per each calculation and aggregation level.
Define the calculation hooks (callbacks) in Tableau: Having defined input parameters, aggregation levels and required output calculations, it is now possible to define custom calculations. For convenience, I have structured all the optimization routines in a portfolio_optimization Python package in which I have defined functions to return the selected quantities for the specific aggregation levels. Note that the Python package path is passed to the function and used in the script to signal where the portfolio optimization package is stored. The current aggregation level indexer (e.g. for the subcategory aggregation level, the subcategory itself) is always passed to the Python backend to ensure that results will be returned in the proper order. The input parameter, SKU GM threshold, is passed as well.

Calculation of optimized margins at sub category aggregation level

2. Create the calculation routines backend in Python

The Python backend is divided in two function classes grouped according to their execution context. Looking for the highest responsiveness of the interface is key to identify all the operations that are executed once. To this class, for example, belong the database extraction, transform and load operations. Such functions are called “one-time-operations.” Other functions, such as the Tableau callbacks, are repeated multiple time.

One-time operations: In this example, the database is loaded only once — when the script is executed the first time. The database is then made available to all the other functions by storing it into a global variable. To detect whether the database is already loaded, the local name space is checked for an existing copy of it. Without such a precaution, the database would be loaded any time a calculation is requested by Tableau, negatively impacting execution speed.
Tableau callbacks: Every hook previously defined must have a function serving it. In the current example, this is obtained by providing separate calculations for revenues, volumes and margins, and by using the indexer passed as a function’s input to index the Pandas groupby function, which is used to aggregate optimization results. To improve the execution speed, callbacks implement a parameter-change detector. Only if the parameter is changed will a new optimization be created, with its result available to all callbacks leveraging a global variable. The detection of parameter change is implemented through a persistent variable used to store the parameter’s value at the previous execution. This approach ensures that expensive operations will be kept to a minimum, and further improves execution speed.

3. Design the Tableau front end

By this step, all the fundamental bricks, including aggregation levels, parameters to be tuned, and output columns returned by the calculation backend, have been defined

The next task is to design dashboards that will show the impact of the product-portfolio optimization. To make the optimization easier to discuss, define two separate worksheets, showing the portfolio before and after the optimization process. Show the two worksheets side-by-side.

The availability of multiple data sources enriches the portfolio database by populating the visualization with information on the current plant-utilization rate and on the rate deriving from the optimized portfolio. Again, the two information visualizations are shown side by side to better demonstrate the impact of the optimization on the production plants.

Plant utilization rate of current vs optimized portfolio

The Benefits of the Novel Approach to TabPy

In addition to the significant business value of enabling teams to interact in real-time with powerful data-science techniques, this novel approach has significant backend benefits as well. Many other data-visualization techniques require costly data-scientist participation throughout the process. In this approach, however, data-scientist resources are required only to prepare the draft Tableau dashboard and create the backend Python calculation routines. Tableau’s ease of use makes it possible for a much wider range of resources to design the front end, test it with the end users and maintain it.

Front-end design is a typically a lengthy and iterative process involving multiple discussions with final users. By enabling managers to vary team composition during project execution, the novel TabPy approach can significantly improve cost efficiency by assigning the appropriate resources to the appropriate tasks. This approach also ensures high re-usability of the underlying backend, enabling a wide range of users to use Tableau to build their own custom dashboards and stories to suit specific contexts, audiences and situations. This reuse of calculation logics and underlying fundamental bricks is yet another way to improve the overall cost of data visualization.

In case you are interested in deep diving the example, both the Tableau dashboard and Python backend are available at

https://github.com/davideguatta/tableau_portfolio_optimization

If you would like additional information on this approach, please see the official TabPy Github repo at https://github.com/tableau/TabPy .