MLearning.ai
Published in

MLearning.ai

Pivot Table in Pandas for data summarization and visualization in Python

Hello Friends,

I am sure, most of you have worked with data and performed data analysis using Pivot function of Microsoft Excel. MS Excel provides an easy, quick drag and drop to perform calculation on data, aggregate data, and summarize data sets for advance analysis to draw meaningful patterns, insights and trends. It is easy to work with MS Excel do generate visualization for data presentations.

“The Pivot Table takes simple column-wise data as input and groups the entries into a two-dimensional table, which provides a multi-dimensional summarization of the data in python.” Jake VanderPlas

For the python lovers, pandas library provides pivot_table() function that is used to handle tabular dataset though pandas dataframe similar to the MS excels. It is very powerful, fast and comes with customizable parameters. Pandas pivot_table() function is more efficient, powerful and robust than MS Excel pivot function. You might take little time to understand the syntax of the function. But once you understand, it will make data aggregation and data visualization in supper simple for you. It can handle very large data files easily.

In this article, we are gonna to see nitty gritty of pandas pivot_table() function to make data aggregation and visualization task very easy for you. So, let dive into deeper to understand it. We will understand it with help of an interesting use case of ICC T20 World Cup dataset. We have taken T20 IIC world cup 2021 dataset from Kaggle https://www.kaggle.com/affanazhar/icct20-worldcup2021?select=kaggle_data.csv and renamed the data set as T20_ICC_WC.csv. The following steps exhibits how to use pivot_table() function of pandas to perform aggregation and visualization similar to MS excel in Python.

  1. If you are jupyther notebbok user, run jupyter notebook, import pandas package and read CSV file into pandas dataframe. For example,

2. Perform data selection: Select the data items that you want to take into consideration for the data aggregation and visualization. For example, we want to visualize venue wise how many time the team who won the toss and chose fielding / batting and also won the match. So, we have selected required data items from the dataset. For example,

3. Pivot_table() function uses index parameter to specify keys to group by on the pivot table index. (Click here To refer pandas pivot_table() function official documentation). For example, we can summarize toss_decision by the column name venue and time. By default pivot_table() use aggregation by numpy.mean() as given below:

4. You can use aggregation function like max, min, mean, count etc. But here we want to apply aggregation function count. So, we can specify aggfunc = ‘count’ as given below:

5. We can also fill NA/NaN value with desired values. For example,

6. Now we can import matlotlib.pyplot to plot desired visualization. For example, bar graph:

7. We can also generate visualization using horizontal bar graph. For example,

Conclusion

In this article we have seen how to use pivot_table() function of pandas. It seems very easy and supper simple to use pivot_table() function with greater flexibility and robustness. Once you understand concept, it is matter of few simple lines of code to perform aggregation and visualization in Python. You can also access Jupyter notebook to perform tutorial on pivot_table.

On wrapping up notes, feel free to share your comments. Your comments will surely help me to present contents in a better way. See you next week.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Dr. Virendra Kumar Shrivastava

Dr. Virendra Kumar Shrivastava

527 Followers

Professor || Alliance College of Engineering and Design || Alliance University || Writer || Big Data Analytics