Pandas tricks for Data Scientists
Pandas make life easier for any data scientist
Data Science is manipulating data, looking for patterns, and coming up with solutions to drive revenue, lower expenses, and thereby increase overall business profitability ― Ken Poirot

A data scientist spends approximately 80% of the time for manipulating and getting insights from data. So, we have to think in a smart way to reduce time during data preprocessing/cleaning. Pandas will never fail to help you interpret the data by reducing your time. Pandas is the most commonly used Python library for data manipulation and data analysis.
In this article, I will try to address the most useful pandas tricks with the help of weather data and you feel amazed when you start applying them.
To begin with, let’s see the data by reading from CSV file
import pandas as pddf = pd.read_csv("weather.csv")
df
Output

1. Reverse the order
Reversing can be done by slicing the data frame with the help of loc attribute.loc is used to access the group of rows or columns.
The first collon specifies selecting all rows and the second collon specifies selecting all columns. By specifying -1 we will be able to reverse the order.
df.loc[::-1]Output

2. Filter Data by multiple categories
In the weather data, the event attribute is with multiple categories like rain, snow, sunny, etc. What if you want to get the data with rain or sunny ??? The isin function helps to filter them.
df[df.event.isin([‘Rain’,’Sunny’])]Output

3. Split column value to multiple columns
create a data frame with the city and its attributes
df1 = pd.DataFrame({‘city’:[‘new york’,’mumbai’,’paris’] , ‘temp_windspeed’: [[21,4],[34,3],[26,5]]})Output

In the above fig, we can see that the second attribute values are a type of list having both temperature and wind speed. Let us try to have these values into two columns. This can be done just by applying a series on the second column.
df2 = df1.temp_windspeed.apply(pd.Series)
df2.rename(columns= {0:'temperature',1:'windspeed'})Output

4. Combine two data frames
In real life we may always not deal with a single data source, we may have data from different sources and can be analyzed by combining them. The contact function helps you to combine two data frames.
Create two data frames
df1 = pd.DataFrame({
“city”: [“new york”,”florida”,”mumbai”],
“temperature”: [22,37,35]
})df2 = pd.DataFrame({
“city”: [“chicago”,”new york”,”florida”],
“temperature”: [35,28,25]
})
Concat the created data frames
pd.concat([df1, df2],ignore_index=True)Output

5. Pivot
Suppose if you would like to see the temperature of each city for the day. In simple, if I would like to analyze how the temperature is changing in each city day by day. Pivot makes these things simple and here it is !!!.
df.pivot(index=’city’,columns=’day’,values=”temperature”)Output

6. Reshape
Assume you want to analyze the student’s marks for two years and is given in the below format.
header = pd.MultiIndex.from_product([[‘2018’,’2019'],[‘Physics’,’Chemistry’,’Maths’]])
data=([[31,45,65,43,32,65],[76,56,78,65,78,65],[44,56,73,76,87,56]])
df = pd.DataFrame(data,
index=[‘John’,’Gil’,’Gina’],
columns=header)
We can just reshape the above data into rows for further analysis by using the stack function.
df.stack()Output

References
[1] Kevin Markham, Data School, https://www.dataschool.io/easier-data-analysis-with-pandas/
[2] code basics, https://www.youtube.com/channel/UCh9nVJoWXmFb7sLApWGcLPQ
Hope you enjoyed it !!! Stay tuned !!! I will try to collect/create more tricks as possible and come with another article having amazing tricks !!!!. Please do comment on any queries or suggestions !!!!!

