Analytics Vidhya
Published in

Analytics Vidhya

Pandas tricks for Data Scientists

Pandas make life easier for any data scientist

Data Science is manipulating data, looking for patterns, and coming up with solutions to drive revenue, lower expenses, and thereby increase overall business profitability ― Ken Poirot

image by Stan W.

A data scientist spends approximately 80% of the time for manipulating and getting insights from data. So, we have to think in a smart way to reduce time during data preprocessing/cleaning. Pandas will never fail to help you interpret the data by reducing your time. Pandas is the most commonly used Python library for data manipulation and data analysis.

In this article, I will try to address the most useful pandas tricks with the help of weather data and you feel amazed when you start applying them.

To begin with, let’s see the data by reading from CSV file

import pandas as pddf = pd.read_csv("weather.csv")
df

Output

weather data

1. Reverse the order

Reversing can be done by slicing the data frame with the help of loc attribute.loc is used to access the group of rows or columns.

The first collon specifies selecting all rows and the second collon specifies selecting all columns. By specifying -1 we will be able to reverse the order.

df.loc[::-1]

Output

reverse in rows

2. Filter Data by multiple categories

In the weather data, the event attribute is with multiple categories like rain, snow, sunny, etc. What if you want to get the data with rain or sunny ??? The isin function helps to filter them.

df[df.event.isin([‘Rain’,’Sunny’])]

Output

Filtered data

3. Split column value to multiple columns

create a data frame with the city and its attributes

df1 = pd.DataFrame({‘city’:[‘new york’,’mumbai’,’paris’] , ‘temp_windspeed’: [[21,4],[34,3],[26,5]]})

Output

weather data

In the above fig, we can see that the second attribute values are a type of list having both temperature and wind speed. Let us try to have these values into two columns. This can be done just by applying a series on the second column.

df2 = df1.temp_windspeed.apply(pd.Series)
df2.rename(columns= {0:'temperature',1:'windspeed'})

Output

After splitting

4. Combine two data frames

In real life we may always not deal with a single data source, we may have data from different sources and can be analyzed by combining them. The contact function helps you to combine two data frames.

Create two data frames

df1 = pd.DataFrame({
“city”: [“new york”,”florida”,”mumbai”],
“temperature”: [22,37,35]
})
df2 = pd.DataFrame({
“city”: [“chicago”,”new york”,”florida”],
“temperature”: [35,28,25]
})

Concat the created data frames

pd.concat([df1, df2],ignore_index=True)

Output

After combining

5. Pivot

Suppose if you would like to see the temperature of each city for the day. In simple, if I would like to analyze how the temperature is changing in each city day by day. Pivot makes these things simple and here it is !!!.

df.pivot(index=’city’,columns=’day’,values=”temperature”)

Output

Pivot data

6. Reshape

Assume you want to analyze the student’s marks for two years and is given in the below format.

header = pd.MultiIndex.from_product([[‘2018’,’2019'],[‘Physics’,’Chemistry’,’Maths’]])
data=([[31,45,65,43,32,65],[76,56,78,65,78,65],[44,56,73,76,87,56]])
df = pd.DataFrame(data,
index=[‘John’,’Gil’,’Gina’],
columns=header)
Students data

We can just reshape the above data into rows for further analysis by using the stack function.

df.stack()

Output

After reshape

References

[1] Kevin Markham, Data School, https://www.dataschool.io/easier-data-analysis-with-pandas/

[2] code basics, https://www.youtube.com/channel/UCh9nVJoWXmFb7sLApWGcLPQ

Hope you enjoyed it !!! Stay tuned !!! I will try to collect/create more tricks as possible and come with another article having amazing tricks !!!!. Please do comment on any queries or suggestions !!!!!

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
sampath kumar gajawada

sampath kumar gajawada

Machine learning Enthusiast | Analyst | Programmer | All I write my own | Linkedin: https://www.linkedin.com/in/sampath-kumar-gajawada/