Part2: Melt, Pivot and Pivot_table

Praneel Nihar
6 min readNov 5, 2019

--

This page is the secondpart of the blog Reshaping data in pandas. click here to go back to main page

Melt

Pandas Melt

Pandas has different ways to accomplish the same task, the difference being readability and performance. Pandas data frame method name melt works similarly to stack but gives more flexibility.The method takes in the below 5 parameters out of which two parameters namely id_vars and value_vars are crucial to understand how to reshape your data.

  • id_vars: list of column names that you want to preserve as columns and not reshape (list,tuple or ndarray)(optional).
  • value_vars: list of column names that you want to reshape as columns (list,tuple or ndarray) (optional).
  • var_name: Name to use for the variable column, defaults to variable.(scalar)(optional)
  • value_name:Name to use for the value column, defaults to `value`(scalar)(optional).
  • col_level:If column are multi index then use this level to melt (int or string)(optional).

All of the above parameters mentioned are optional. Let us try to understand the usage of each of the parameters by applying it to our sample dataset.

Working of Melt:

  • The id_vars or the identification variables remain in the same column but repeat for each of the value_vars
  • One crucial aspect of melt is that it ignores the values in index, in fact it drops the existing index and replaces it with the RangeIndex. so if you have values in index that you want to keep, you need to do a reset_index before applying melt.

Let’s apply the melt function on our sample sensors data frame.

# Sample data
df_sensors
# Applying Melt on sample data
df_sensors.melt(id_vars=['Sensor', 'Metric'],
value_vars=['2017', '2018'])

melt works similar to stack but gives more flexibility. By default melt assigns variable and value as the names for value_vars, this can be overrided by passing the desired column names as an input to the parameters var_name and value_name as shown below.

# Setting the names of variable and value columns.
df_sensors.melt(id_vars=['Sensor', 'Metric'],
value_vars=['2017', '2018'],
var_name='Year',
value_name='value')

Point to Remember: The advantage of melt over stack is that you can mention the column name which you want to preserve in the index without explicitly setting them as index .

Pivot:

pivot method is similar to unstack but is more easy to use. It takes in 3 parameters (mentioned below) as input which are index, columns and values

  • index: Column to use for the new frames index, if none use current index(string or object).
  • columns: Column to use for the new frames columns (string or object).
  • values : Column(s) to use for the new frames values (string or object).

Working of pivot:

  • Each parameter takes in a single column as a string.
  • The index remains in the vertical and becomes the new index.
  • The values of the columns referenced by columns becomes new column names.
  • The values referenced by the values are tiled to correspond the intersection of their former index and columns label.

Let us consider the previous melted output and see how to apply pivot in order to reshape it to original form.

# loading melted data as a dataframe
df_melted = df_sensors.melt(id_vars=['Sensor', 'Metric'],
value_vars=['2017', '2018'],
var_name='Year',
value_name='value')
df_melted

We cannot use pivot directly to pivot the Year column , as mentioned above pivot raises an ValueError when there are duplicate values in the index columns.

# Pivot raises a ValueError
df_melted.pivot(index=['Sensor', 'Metric'], columns='Year', values='value')

In order to do that we’ll have to set the index columns explicitly before applying pivot as shown below.

# Setting the index and applying pivot
(df_melted
.set_index(['Sensor', 'Metric'])
.pivot(columns='Year')['value'])
# Processing the pivot output
(df_melted
.set_index(['Sensor', 'Metric'])
.pivot(columns='Year')['value']
.reset_index()
.rename_axis([None], axis=1))

Points to Remember:

pivot raises a ValueError when any index or column combinations has duplicate values.

pivot cannot take more than one index at a time.In order to achieve that you will have to explicitly set the index and apply pivot on columns.

Pivot_table:

pivot_table is a versatile and flexible function. Below mentioned are the list of the input parameters to the fucntion.. Below mentioned are the input parameters to the function.

  • index: column(s) which is intended to stay as index. (column, list, array, Grouper)
  • columns: column(s) which are pivoted (column, list, array, Grouper).
  • values : column to aggregate.
  • aggfunc: function, list of aggregation functions.(defaults to mean)
  • fill_value: Scalar to fill for missing values in the result (scalar, default None).
  • margins: whether to add all rows/columns (Bool, default False )(eg: subtotal or grand total)
  • dropna: Do not include columns whose values for all rows are NaN (Bool, default True).
  • margins_name: Name of the row / column that will contain the totals when margins is True (string, default All).

Working of pivot_table:

  • pivot_table is a generalization of pivot that can handle duplicate values for one pivoted index/column pair. Specifically, you can give pivot_table a list of aggregation functions using keyword argument aggfunc. The default aggfunc of pivot_table is numpy.mean.
  • pivot_table also supports using multiple columns for the index and column of the pivoted table. A hierarchical index will be automatically generated for you.
  • pivot_table also has some additional default parameters named margins , margins_name which indicates whether to add rows and columns total in the output or not.

Let us see how can we apply pivot_table to the previous melted output in order to convert it to original form.

# loading melted data as a dataframe
df_melted = df_sensors.melt(id_vars=['Sensor', 'Metric'],
value_vars=['2017', '2018'],
var_name='Year',
value_name='value')
df_melted
# Applying pivot_table on melted output
(df_melted
.pivot_table(index=['Sensor', 'Metric'],
columns='Year',
values='value'))
# Processing the Output
(df_melted
.pivot_table(index=['Sensor', 'Metric'],
columns='Year',
values='value')
.rename_axis([None],axis=1)
.reset_index())

pivot_table is widely used to summarize the numerical data. You can perform multiple aggregations like count, sort, average, total etc using pivot_table . We can use the parameters aggfunc, margins to perform any no of aggregations at a time on the chosen column values.

# Performing aggregations using aggfunc.
df_melted.pivot_table(index=['Sensor', 'Year'],
columns='Metric',
values='value',
aggfunc=[np.sum, np.mean],
fill_value=0,
margins=True,
margins_name='Total')

Key Differences between pivot and pivot_table:

  • pivot is used for pivoting without aggregation. Therefore, it can’t deal with duplicate values for one index/column pair whereas pivot_table can be used for both the purposes.
  • pivot doesn’t accept a list for index pivot_table accepts.

--

--