Reshape Pandas Data Frames

A walk-through example of how you can reshape pandas data frames

George Pipis
Nov 24, 2020 · 3 min read
Image for post
Image for post
Image on Unsplash

We will provide some examples of how we can reshape Pandas data frames based on our needs. We want to provide a concrete and reproducible example and for that reason, we assume that we are dealing with the following scenario.

We have a data frame of three columns such as:

  • ID: The UserID

The data are in a long format, where each case is one row. Let’s create the data frame:

Create the Pandas Data Frame

import pandas as pddf = pd.DataFrame({'ID':[1,1,1,1,2,2,3,3,3,4],
Image for post
Image for post

Aggregate the Data by ID

Let’s say that we want to aggregate the data by ID by concatenating the text variables Type and Value respectively. We will use the lambda function and the join where our separator will be the | but it can be whatever you want.

# Aggregate the data by IDdf_agg = df.groupby('ID', as_index=False)[['Type','Value']].agg(lambda x: '|'.join(x))
Image for post
Image for post

As we can see now we have 4 rows, one per each ID.

Reshape to a Long Format

Our goal is to convert the “ df_agg “ to the initial one. We will need some steps to achieve this.

Convert the Columns to Lists

We will need to split the Type and Value columns and to transform them into lists.

df_agg['Type'] = df_agg['Type'].apply(lambda x: x.split("|"))
df_agg['Value'] = df_agg['Value'].apply(lambda x: x.split("|"))
Image for post
Image for post

Create a list of tuples from the two-column lists

We know that the elements of each list appear in order. So, we need to do a mapping between the Type and the Value list element-wise. For that reason, we will use the zip function.

df_agg['Type_Value']= df_agg.apply(lambda x: list(zip(x.Type,x.Value)), axis=1)
Image for post
Image for post

Exlpode the list of tuples

Now, we will “explode” the Type_Value as follows:

df_agg = df_agg.explode('Type_Value') df_agg
Image for post
Image for post

Split the tuple into two different columns

Now we want to split the tuple into two different columns, where the first one is referred to the Type and the second one to the Value.

df_agg[['New_Type','New_Value']] = pd.DataFrame(df_agg['Type_Value'].tolist(), index=df_agg.index) df_agg
Image for post
Image for post

Now we will keep only the columns that we want and we will rename them.

df_agg = df_agg[['ID','New_Type', 'New_Value']].\
rename(columns={"New_Type": "Type", "New_Value": "Value"}).\
Image for post
Image for post

Check if the Data Frames are identical

As we can see we started with a long format, we reshaped the data, and then we converted it back to the initial format. Let’s verify if the initial data frame is the same as the last one that we created.


and we get True!

Originally published at

The Startup

Medium's largest active publication, followed by +755K people. Follow to join our community.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store