Reshaping data using pandas.

Praneel Nihar
3 min readNov 5, 2019

--

Reshaping data in pandas

Data analysis is the first and foremost step in machine learning life cycle. It includes inspecting, cleaning, transforming and modelling data with a goal to discover useful information, informing conclusions and support decision making. Python’s pandas library is one of the powerful and widely used tool for data analysis. Transforming or reshaping data into understandable form is one of the key aspect in data analysis. In this blog we will try to understand some of the widely used pandas methods listed below for reshaping data:

  • stack
  • unstack
  • melt
  • pivot
  • pivot_table

Why should we reshape or restructure data?

Real world data is not always in a consumable form. It contains lots of missing entries and errors. It is easy to extract data from the rows and columns of a data but there are situations when we need the data in a format that is different from format in which we received it. Therefore it is important to clean and restructure the data to a consumable form. Reshaping data includes converting columns to rows, rows to columns and performing aggregation to bring the data into a form which is easy to analyze.

Sample data :

Let us create a small sample data set which we will be using throughout the blog . We will be creating a flow sensor data for 3 metric values over the last 2 years. The below image consists the data for 2 sensors(sensor1, sensor2) and 3 metric values(Pressure, Temperature , Flow rate) measured over the last 2 years (2017, 2018).

import pandas as pd
import numpy as np
# Sensors data of 2 sensors for last 2 years.
iterables = [['sensor1', 'sensor2'],
['Pressure', 'Temperature', 'Flow']]
# Setting the names of variable and value columns.index = pd.MultiIndex.from_product(iterables,names=['Sensor', 'Metric'])df_sensors = pd.DataFrame(np.random.randint(low=40, high=100,size=(6,2)),
index=index,
columns=['2017', '2018']).reset_index()
df_sensor
Sample Data

Let us now try to understand the usage of each of the above mentioned methods by applying them on the created sample data. We will be using some of the pandas helper methods set_index, reset_index, rename and rename_axis to add final touches to the data frame in order to bring the output in a consumable form.

For better understanding of each method the blog is divided into two logical parts where in each part explains a set of opposite operations with functionality description of each method along with some key points to note :

I hope you get a basic understanding of each of the above method after reading the two parts. Try applying them on different data sets and share your experiences in comments section.

please go to the below GitHub link to download the full notebook.

https://github.com/upraneelnihar/Blogs/tree/master/pandas_blogs

--

--