Deep Diving Pandas Groupby and Pivot

--

Skateboard Pivot

Pandas is a powerful data analysis library in Python that provides various functionalities to manipulate and analyze tabular data. Two important functions for data manipulation in Pandas are groupby() and pivot(). In this article, we will explore these two functions and provide examples to demonstrate their usage.

Groupby()

The groupby() function in Pandas is used to group data based on one or more columns. It is a powerful tool for data analysis as it allows you to group data based on certain criteria and then perform various calculations on each group.

Let's say we have a dataset containing information about employees in a company. We want to group the data by department and then calculate the average salary for each department. Here's how we can do that using the groupby() function:

import pandas as pd

data = {'Employee': ['John', 'Anna', 'Peter', 'Samantha', 'David', 'Eric', 'Emily', 'Michael'],
'Department': ['Sales', 'Marketing', 'Sales', 'Marketing', 'Sales', 'Marketing', 'Sales', 'Marketing'],
'Salary': [60000, 65000, 55000, 70000, 50000, 75000, 45000, 80000]}

df = pd.DataFrame(data)

grouped = df.groupby(['Department'])['Salary'].mean()

print(grouped)

Output:

Department
Marketing 71750.0
Sales 52500.0
Name: Salary, dtype: float64

In this example, we first create a dictionary containing the data for our employees. We then create a DataFrame using this dictionary. Next, we group the DataFrame by the 'Department' column using the groupby() function. Finally, we calculate the average salary for each department using the mean() function.

Pivot()

The pivot() function in Pandas is used to reshape data from long to wide format. It allows you to transform rows into columns, and columns into rows. This function is particularly useful when you want to analyze data using a pivot table.

Let's say we have a dataset containing information about sales for a company. We want to create a pivot table that shows the total sales for each product category by month. Here's how we can do that using the pivot() function:

import pandas as pd

data = {'Month': ['Jan', 'Feb', 'Mar', 'Jan', 'Feb', 'Mar', 'Jan', 'Feb', 'Mar'],
'Category': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'Sales': [10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000]}

df = pd.DataFrame(data)

pivot_table = df.pivot(index='Category', columns='Month', values='Sales')

print(pivot_table)

Output:

Month         Feb     Jan     Mar
Category
A 20000.0 10000.0 30000.0
B 50000.0 40000.0 60000.0
C 80000.0 70000.0 90000.0

In this example, we first create a dictionary containing the data for our sales. We then create a DataFrame using this dictionary. Next, we use the pivot() function to create a pivot table that shows the total sales for each product category by month. The index parameter specifies the column to use as the index, the columns parameter specifies the column to use as the column headers, and the values parameter specifies the column to use as the values in the table.

Groupby() and Pivot() together

The groupby() and pivot() functions can also be used together to perform more complex data analysis. Let's say we have a dataset containing information about sales for a company. We want to create a pivot table that shows the total sales for each product category by month, and then calculate the average sales for each category. Here's how we can do that using the groupby() and pivot() functions together:

import pandas as pd
data = {'Month': ['Jan', 'Feb', 'Mar', 'Jan', 'Feb', 'Mar', 'Jan', 'Feb', 'Mar'],
'Category': ['A', 'A', 'A', 'B', 'B', 'B', 'C', 'C', 'C'],
'Sales': [10000, 20000, 30000, 40000, 50000, 60000, 70000, 80000, 90000]}
df = pd.DataFrame(data)
pivot_table = df.pivot(index='Category', columns='Month', values='Sales')
grouped = df.groupby(['Category'])['Sales'].mean()
print(pivot_table)
print(grouped)

Output:

Month         Feb     Jan     Mar
Category
A 20000.0 10000.0 30000.0
B 50000.0 40000.0 60000.0
C 80000.0 70000.0 90000.0

Category
A 20000.0
B 50000.0
C 80000.0
Name: Sales, dtype: float64

In this example, we first create a dictionary containing the data for our sales. We then create a DataFrame using this dictionary. Next, we use the pivot() function to create a pivot table that shows the total sales for each product category by month. We then group the DataFrame by the 'Category' column using the groupby() function. Finally, we calculate the average sales for each category using the mean() function.

The groupby() and pivot() functions are powerful tools for data manipulation and analysis in Pandas. By understanding how to use these functions, you can perform complex data analysis tasks with ease.

Please consider supporting my cousin’s clothing brand, you do not need to make a purchase simply following this post on Instagram is a blessing: https://instagram.com/evestiaralifestyle?igshid=ZDdkNTZiNTM=

FREE PDF to Text CONVERTER Click here: Convert pdf to text for free!

Plug: Please purchase my book ONLY if you have the means to do so, I usually do not advertise, but I am struggling to stay afloat. Imagination Unleashed: Canvas and Color, Visions from the Artificial: Compendium of Digital Art Volume 1 (Artificial Intelligence Draws Art) — Kindle edition by P, Shaxib, A, Bixjesh. Arts & Photography Kindle eBooks @ Amazon.com.

--

--