5 Cool Advanced Pandas Techniques for Data Scientists

Use these techniques …

Naina Chaturvedi
Oct 20 · 4 min read
Image for post
Image for post
Pic from Unsplash.com

Before you start implementing these techniques, load the data of your choice in your working environment. For this post, I’m using Iris data.

Start with importing the necessary libraries such as Pandas and Numpy and loading your data set. Once done, dive in the techniques below —

1. Split data using pandas

In the code below, we are splitting the data into a random sample of rows and removing them from the original data after dropping index values.

iris_data_new= df.copy()
df1=iris_data_new.sample(frac=0.75,random_state=0)
iris_data_new=iris_data_new.drop(df1.index)
df2=iris_data_new.sample(frac=0.25,random_state=0)
iris_data_new=iris_data_new.drop(df2.index)
print(df1.shape)

Output —

(112, 5)

2. Binning Data

Binning is a technique to group/bin your data into multiple buckets which is very helpful if you dealing with continuous numeric data. In pandas you can bin the data using functions cut and cut. First check the shape of your data i.e no of rows and columns.

print(iris_data.shape)

Output —

(150, 5)

Then bin your data using qcut as shown below —

pd.qcut(df['sepal_width'],q=5).value_counts()

Output —

(2.7, 3.0]      50
(1.999, 2.7] 33
(3.1, 3.4] 31
(3.4, 4.4] 24
(3.0, 3.1] 12
Name: sepal_width, dtype: int64

3. Slicing using loc and iloc functions

You can do position based and label based slicing using iloc and loc functions respectively.

iris_data.loc[100:105, 'petal_length':'species']

Output —

Image for post
Image for post
iris_data.iloc[:4]

Output —

Image for post
Image for post

4. Mean Imputation and Interpolate method

Mean Imputation is a technique in which the missing value is replaced by the mean of available data in the chosen column.

First see if your data has missing values or not.

iris_data.isnull().sum()

Output —

Image for post
Image for post

Then calculate the mean and replace the missing value —

iris_data['sepal_width'].mean()

Output —

3.0516778523489942

Replace the missing value

iris_data['sepal_width'].fillna(iris_data['sepal_width'].mean(), inplace=True)
iris_data.isnull().sum()
Image for post
Image for post

Interpolate method —

iris_data['sepal_width'].fillna(iris_data['sepal_width'].interpolate(), inplace=True)
Image for post
Image for post

5. Combining Data using Concat and Join

Just like in numpy, pd.concat() function is used for concatenation of Series or DataFrame objects in pandas.

df4=pd.concat([df1,df2],axis=0)print(df4)

Output —

Image for post
Image for post

Joins —

Merging and joining the data is one of the most important skill in the data science. Understanding and Implementing it right is crucial in order to analyze data well.

In this we will implement —

  • Inner Join : keep rows from both the tables/data frames based on the specified merge condition.
  • Full Join : keep all the rows form left table and right table with matched rows wherever possible and NaN’s elsewhere.
  • Left Join : keep all the rows form left table and wherever there are missing values in the right table, put it as NaN’s, based on the specified merge condition.
  • Right Join : keep all the rows form right table and wherever there are missing values in the left table, put it as NaN’s, based on the specified merge condition.
Image for post
Image for post
Source and credits: Stack Overflow
#Inner Join
df5=pd.merge(df1,df2,on='sepal_length')
print(df5)
Image for post
Image for post
#Full outer join
df6=pd.merge(df1,df2,how='outer')
print(df6)
Image for post
Image for post
#Left Join
df7=pd.merge(df1,df2,how='left')
print(df7)
Image for post
Image for post
#Right Join
df8=pd.merge(df1,df2,how='right')
print(df8)
Image for post
Image for post

Thanks for Reading. Keep Learning :)

Data Driven Investor

empowering you with data, knowledge, and expertise

Sign up for DDIntel

By Data Driven Investor

In each issue we share the best stories from the Data-Driven Investor's expert community. Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Naina Chaturvedi

Written by

🇺🇸,World Traveler,Women in Tech,Sr. SDE-Earning my bread using 0&1,Coursera Instructor ML & GCP, Trekker, Avid Reader,I write for fun@AI & Python publications

Data Driven Investor

empowering you with data, knowledge, and expertise

Naina Chaturvedi

Written by

🇺🇸,World Traveler,Women in Tech,Sr. SDE-Earning my bread using 0&1,Coursera Instructor ML & GCP, Trekker, Avid Reader,I write for fun@AI & Python publications

Data Driven Investor

empowering you with data, knowledge, and expertise

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store