Part1: Stack, Unstack

Praneel Nihar
6 min readNov 5, 2019

--

This page is the first part of the blog Reshaping data in pandas. click here to go back to main page.

Stack

Pandas Stack Image.

In pandas terminology stacking is referred as transforming horizontal column names to vertical column values . stack function converts the data from a wide to long format.Below mentioned are the input parameters to the function.

  • level: Prescribed level(s) to stack from column axis onto index axis (int,str, list, default -1).
  • dropna: Whether to drop rows with missing values in the resulting frame, defaults (bool, default True).

Let us start by anaylsing the sample dataset shown below.

# Sample data
df_sensors

We can see that the two year values 2017 and 2018 are present as column names, but ideally both of them depict a single variable which can named as Year. We can resturcture the two Year values as a single column using stack as shown below.

# Stacking the Year columns into a single column.
(df_sensors.set_index(['Sensor', 'Metric'])
.stack()
.reset_index()
.rename(columns={'level_2': 'Year', 0: 'Value'}))
Desired Output.

Let us now understand the above code snippet in a step wise manner

Step1:

# Applying stack on sample data
df_sensors.stack()

We can see that by default stacktakes in all the columns and converts them into a single vertical column at the inner most level. The above output is a series with hierarchical index, all the column names are added as an index at the innermost level and each row is repeated as values.

If there are any columns which we want to stay in the index we have to explicitly set them using set_index method before performing the stack.

Step 2:

Let us now apply stack by setting the Metric column as index and analyze the output.

df_sensors.set_index(['Metric']
df_sensors.set_index(['Metric']).stack()

The output of above step is a series with hierarchical index. In order to convert it to a data frame we have to use pandas helper method reset_index

# Processing the stacked output
(df_sensors
.set_index(['Metric'])
.stack()
.reset_index())

We can see that the output is still messy i.e in level_1 column there are multiple data types hence the data is still not in a desired form.

Step 3:

In order to convert the multiple year columns to a single year column we have to set the columns Sensor, Metric as index and apply stack .

# Stacking the Year Columns
df_sensors.set_index(['Sensor', 'Metric']).stack()

The above output is a hierarchical indexed series which is in the desired form. In order to convert it to a dataframe we have to use the helper method’s reset_index and rename as shown below:

# Processing the stacked output
(df_sensors.set_index(['Sensor', 'Metric'])
.stack()
.reset_index()
.rename(columns={'level_2': 'Year', 0: 'Value'}))

When more than one column header is present in the dataset we can stack the desired column header using the level parameter of the stack function as hown below:

Let us add a column header named Value on top of the Year columns .

# Add a column level `Value` on top of Year columns
df_sensors2 = df_sensors.set_index(['Sensor', 'Metric'])
df_sensors2.columns = pd.MultiIndex.from_product([['Value'], df_sensors2.columns])
df_sensors2

Level 0: [Value]

Level 1 : [ 2017, 2018]

Now we can stack the two year columns by specifying the level parameter while applying stack

# Using Level parameter to stack the year columns
df_sensors2.stack(level=1)
# Processing the output
(df_sensors2
.stack(level=1)
.reset_index()
.rename(columns={'level_2':'Year'}))

Points to Remember:

By default stack takes in all columns present at the inner most level in the column axis and stacks them row wise . If you want any column to stay in index you need to set your index column explicitly using pd.DataFrame.set_index method before performing stack and then use pd.DataFrame.reset_index() to convert the output to a data frame.

Unstack:

Pandas Unstack Image

unstack is an inverse operation of stack i.e it converts a data frame from long to wide format. By default it takes the inner most index values and returns a data frame by reshaping them as the columns. Below are the parameters for the unstack method.

  • level: level(s) of index to unstack, defaults to -1 (int or string or list of these)
  • fill_value: replace NAN with value specified if unstack produces missing values.

In order to understand the unstack method let us take the stacked output and see how we can invert the stack operation.

# Loadind stacked data into a dataframe
df_stacked = (df_sensors.set_index(['Sensor', 'Metric'])
.stack()
.reset_index()
.rename(columns={'level_2': 'Year', 0: 'Value'}))
df_stacked

We can now invert the stacked output by following the below three steps:

  • Set the columns [Sensor, Metric and Year] as index.
  • Apply unstack.
  • Process the output to a consumable form.
# Setting the index columns
df_stacked.set_index(['Sensor', 'Metric','Year'])
# Applying unstack
(df_stacked
.set_index(['Sensor', 'Metric','Year'])
.unstack())
# Processing the Output
dfx = (df_stacked
.set_index(['Sensor', 'Metric', 'Year'])
.unstack())
dfx.columns = dfx.columns.droplevel([0])
dfx.rename_axis([None], axis=1).reset_index()

Let us see some more examples of unstack for better understanding:

  1. Unstacking sensors metric values as columns : We can use unstack operation on the stacked data frame to obtain the desired result as shown below.
# Setting Metric as innermost index level
df_stacked.set_index(['Sensor', 'Year', 'Metric'])
# Unstacking the Metric column values.
df_stacked.set_index(['Sensor', 'Year', 'Metric']).unstack()
# Processing the output
dfy = (df_stacked
.set_index(['Sensor', 'Year', 'Metric'])
.unstack())

dfy.columns = dfy.columns.droplevel([0])
dfy.rename_axis([None], axis=1).reset_index()

2. Unstacking both Year and Metrics as columns : Using the level parameter we can unstack methods to reshape multistack function converts a data frame from wide to long format,ple column values as individual hierarchical columns as shown below.

# Unstacking at multiple index levels at a time.
(df_stacked
.set_index(['Sensor', 'Year', 'Metric'])
.unstack(level=[-3,-2]))
# Changing the order of the unstacked levels.
(df_stacked
.set_index(['Sensor', 'Year', 'Metric'])
.unstack(level=[-2,-1]))

--

--