How to convert a table into long-form or tidy-form for seaborn visualizations

Presenting data to non-technical users is often a difficult task and seaborn creates excellent visualizations to bridge the gap between a data science and the audience. Seaborn is an excellent Python visualization library built on top of matplotlib that creates beautiful plots.

Often times seaborn requires the data in a tidy form. If you want to plot a factor plot where multiple categories are plotted on a same axis, the data needs to be in a long form dataset. What does this mean?

Tidy datatable or a longform datatable has the following characteristics:

  1. Each variable you measure should be in one column.
  2. Each different observation of that variable should be in a different row.

Let us see this in an example:

'''CREATING A COST/BENEFIT TABLE & PLOTTING IT ON A FACTOR PLOT'''
approaches_cost = [425500, 275250, 101500] ## COSTS OF A PROCESS
approaches_savings = [-500, 149750, 323500] ## SAVINGS OF A PROCESS
cols_approach = ["Aggressive", "Moderate", "Conservative"] # TYPES OF APPROACHES
## CREATING THE DATATABLE
wnv_approaches = pd.DataFrame({"Approach": cols_approach, "Cost": approaches_cost, "Savings" : approaches_savings})

Here is how the table looks:

Starting datatable

Now in order to create a visualization where I can see the cost & savings on the same axis with the approach on the X-axis, I need to create a factorplot. The factor plot documentation states the following:

data : DataFrame

Long-form (tidy) dataset for plotting. Each column should correspond to a variable, and each row should correspond to an observation.

So, to convert this table into a long-form, we use the melt function.

approaches_plot = pd.melt(wnv_approaches, id_vars=”Approach”, var_name=”Expense_Type”, value_name=”$ Amount”)
Tidied up table — ready to be plotted in seaborn

This creates the data into one column and lets seaborn plot it on the same scale (Y)- Axis. Here is how the plot looks.

Or switching the format to “bar”

So, now the obvious question is — how do I go back to my original data table? Well, pandas has a .pivot feature that untidy-es the datatable. It works like this:

## SWTICHING BACK FROM THE TIDY DATA TABLE TO UNTIDY DATA TABLE
untidy_approaches = approaches_plot.pivot_table(index = "Approach" , values = "$ Amount", columns="Expense_Type")
untidy_approaches.reset_index(drop=False, inplace=True)  # ASSIGN INDEX = 0/1/2/3
untidy_approaches.columns.name = None         #RESET THE INDEX NAME
untidy_approaches

Switching back from a untidy data table into a tidy datatable takes a little bit of effort and playing around with the parameters of df.pivot_table will help you get the table that you want.

Back to the old form

There it is. We have now seen a way to convert a datatable into its tidy form and then reconvert it back into the old self. This gives us the versatility to plot columns in seaborn.

Here is the entire sample code: