Pandas For Beginners: Reshaping Dataframes (Part 2)

Converting your data from long-form to wide-form

Ujjwal Dalmia
Feb 27 · 4 min read
Photo by billow926 on Unsplash

In the last tutorial, we demonstrated the Pandas melt function to reshape the wide-data form into the long-form. This tutorial will focus on the opposite process. To bring everyone on the same page, presenting a visual of the problem we are trying to solve:

Long to Wide Form (Image by User)

Most of us have worked on scenarios like these when using Microsoft excel. Yes, the wide-form of the table resembles the famous pivot tables. We commonly use them to summarize our data based on a specific dimension (expense category in our case). To perform these summarization tasks, Pandas offer a ready-made function pivot_table. A step by step approach to implement this function is detailed below:

Assumption and Recommendation

Being hands-on is the key to master programming. We recommend that you continue to implement the codes as you follow through with the tutorial. The sample data and the associated Jupiter notebook are available in the Scenario_3 folder of this GitHub link.

If you are new to GitHub and want to learn it, please go through this tutorial. To set up a new Python environment on your system, please go through this tutorial.

Following is the list of Python concepts and pandas functions/ methods used in the tutorial:

Pandas functions

  • read_csv
  • pivot_table

Let’s get started

Step 1 — Keeping the data ready

For this tutorial, we have created a dummy dataset containing the monthly expenses incurred by a family across different expense categories for the years: 2018, 2019, and 2020. The dictionary for this data set and the sample data snapshot is as follows:

  • Year — Calendar Year
  • Month — Month of the year
  • Expense Category — Categories across which the family incurred expenses
  • Expense Value — Expenses value incurred by the family
Sample Data (Image by User)

Step 2 — Importing pandas package and the data set in Python.

Once you have the data available, the next step is to import it to your Python environment.

#### Sample Code
#### Importing Pandas
import pandas as pd
#### Importing Data File - Change the Windows Folder Location
imp_data = pd.read_csv("C:\\Ujjwal\\Analytics\\Git\\Scenario_3\\Expenses_Long.csv")

We have used the Pandas read_csv function to read the data in Python.

Step 3 — Creating the wide-form of the expense table (without aggregation).

Once we have read the data, apply the Pandas pivot_table function to create the wide-form of the input dataframe.

#### Create the wide-form
Wide_data = pd.pivot_table(imp_data,index=["Year","Expense Category"], columns="Month",values = "Expense Value").reset_index()
#### Sample Output
Wide_data.head()
Sample Output (Image by Author)

Explanation

The pivot_table function expects the following parameters as input:

  • Dataframe name — The first argument to the pivot_table function is the dataframe on which we have to apply the operation.
  • Index— This is synonymous with the rows section of excel’s pivot table. Column names passed to the index parameter will form the rows of the final dataframe.
  • Columns — This is synonymous with the columns section of excel’s pivot table. Column names passed to the column parameter will form the new columns of the final dataframe.
  • Values — This is synonymous with the values section of excel’s pivot table. Column names passed to the values parameter will be used to fill in the cells of the final dataframe

Points to note

  • reset_index() — We have used the reset_index function to remove the multilevel index created by pivot_table.
  • To pass multiple columns to index, column, or value parameters, use them in the form of lists.
  • Just like the melt function, pivot_table also returns a new pivoted dataframe as the output. Capture this output in a new variable (Wide_data in our case) for future use.

Closing note

Did you know that the pivot table is one of the most common features of Microsoft excel? Quick implementation of data wrangling steps in Python can substantially reduce our model building life cycle. I hope that the solution presented above was helpful.

Do you know how to aggregate the data when converting it into long-form? Can you calculate the average expense per month in 2018, 2019, and 2020 across different expense categories? For the answer, check the code provided in the GitHub repository.

HAPPY LEARNING ! ! ! !

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data…

Sign up for Analytics Vidhya News Bytes

By Analytics Vidhya

Latest news from Analytics Vidhya on our Hackathons and some of our best articles! Take a look.

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Ujjwal Dalmia

Written by

Learner, Practitioner, Mentor, Trainer. Connect with me on Linkedin: https://www.linkedin.com/in/ujjwaldalmia/

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Ujjwal Dalmia

Written by

Learner, Practitioner, Mentor, Trainer. Connect with me on Linkedin: https://www.linkedin.com/in/ujjwaldalmia/

Analytics Vidhya

Analytics Vidhya is a community of Analytics and Data Science professionals. We are building the next-gen data science ecosystem https://www.analyticsvidhya.com

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store