Reshaping in Pandas with stack() and unstack() Functions

Ram Avni
The Startup
Published in
5 min readAug 14, 2019

Pandas is a software library written for the Python programming language for data manipulation and analysis. In particular, it offers data structures and operations for manipulating numerical tables and time series.

Pandas stands for “Python Data Analysis Library”. According to the Wikipedia page on Pandas, “the name is derived from the term “panel data”, an econometrics term for multidimensional structured data sets.” It is one of the most popular Python libraries, and provides interface with other popular libraries like Numpy and Matplotlib. It provides abstractions of dataframes and series, and in this blog I’ll focus on particular functions to reshape these objects.

The concept of stacking comes in handy when we have data with multi indices. Using the stack() function will reshape the dataframe by converting the data into a stacked form. Since we are having multiple indices, that means converting (also called rotating or pivoting) the innermost column index into the innermost row index. Unstacking, as the name implies, does exactly the inverse operation — it will convert the innermost row index back into the innermost column index.

Let’s take a look at the diagram below:

Reshaping with stack() and unstack()

As you can see, stacking means rearranging the data vertically (or stacking it on top of each other, hence the name “stacking”), making the shape of the dataframe as a taller and narrower stack (with fewer columns and more rows). And as you probably could figure out, unstacking will do just the opposite, spreading out the data and reshaping it into a shorter but wider dataframe (with fewer rows but more columns).

Note that if you stack and unstack the same dataframe, you will get back the original dataframe, but you have a choice to stack on different indices and get a different shape of dataframe.

Note that the concepts of multi indexing, hierarchical indexing, pivoting, stacking and unstacking are all related, but let’s start by looking at a sample dataframe, and let’s create a sample dataframe with a typical example we are all familiar with — a grading table.

Methods of multiIndex creation

There are various ways to create multiple indices. One of the simplest ways to create a multi index Series or DataFrame object is by passing a list of two or more arrays to the constructor:

List of arrays

Other options include passing a dictionary where the keys are spelled out as tuples:

Tuples

Finally, you can explicitly create a multi index by using the class method constructors available in the pd.MultiIndex:

Class method constructor

For this blog, I used the first method. Note that I had to convert the original data from list of lists into an np.array type:

An np.array

And here is how our multi index table looks like:

Original Dataframe — Data Science Grading

At first glance, it seems that there are quite a few students with an “Incomplete” grade in the Labs section, so let’s throw in a quick regex to clean up those values by accessing the innermost column index for labs, and now we have:

Cleaning data with regex

Before we even run some statistics, let’s think about how we would like to see the data. This view can give us some idea about the overall performance of the class, but as an individual student you might want to quickly see all your grades across every subject and activity since the beginning of the course — so let’s unstack the original dataframe to get the following:

First unstack

Note that we can go back to the original dataframe by running the inverse function, stack():

Original dataframe

Now, we can further unstack the new dataframe, using df2 = df1.unstack(), and let’s see what we get:

Dataframe 2x unstacked

Here, all the columns indices shifted to the row axis, and we can now see the overall performance of all students across every subject and activity since the beginning of the course.

When I tried to stack the last result back, I got a surprise: Series’ object has no attribute ‘stack.’

Error message — cannot stack

The reason is that once we “run out” of row indices, the dataframe becomes a series, and if we try to stack we’ll get an error. But, since it’s a series, we can unstack this series:

Unstacking series

Note how the indices now flipped from the original df! In other words, it is stacking the dataframe back but in a different order.

And if we unstack one more time, it will flip the innermost row index (activity type — Labs, Projects, etc.), so we completely flipped the indices of the original df:

Unstacking twice

Alright, enough flipping, let’s go back to the original dataframe. Let’s stack, replace the Nan, and see what happens:

Stacking the original dataframe
Cleaning NaN values

We got missing values since the subjects change every period (in this case every week). However, this view could be useful for an annual program (with 2 semesters etc.), where the subjects repeat throughout the year.

And lastly, let’s stack the last result, and we’ll get:

Stacking 2x

Note that it’s again a series type object, very similar to what we got by unstacking the original dataframe two times. The difference however is that in this case the order of the indices is the reverse of the double unstacking, as expected.

References:

https://en.wikipedia.org/wiki/Pandas_(software)

https://nikgrozev.com/2015/07/01/reshaping-in-pandas-pivot-pivot-table-stack-and-unstack-explained-with-pictures/

http://www.datasciencemadesimple.com/reshape-using-stack-unstack-function-pandas-python/

--

--