Pandas filtering methods to solve most of the data analysis tasks

Amsavalli Mylasalam
Variablz Academy
Published in
3 min readJun 20, 2022

Introduction:

Filtering is one of the key 🔑 operations of filtering datas from raw data set for the data analyst.In this article I am explaining different filtering methods available in pandas.

Here I’ve used most famous Titanic dataset for filtering operations, so that you can easily reproduce the code. Please download the data file from here.

import pandas as pddf=pd.read_csv(‘titanic.csv’)
df.head()

Output

Filtering Rows using Conditions:

1) Location (loc) function

Here I am using pandas location function to filter only the 3rd class passengers.

Here I am using pandas location function to filter only the 3rd class passengers.

Input:


dfnew=df.loc[df[‘class’]==’Third’]
dfnew

Output:

2) Query Function

The query() function is used to filter the columns of a DataFrame with a boolean expression.

Listing out only male passenger using Query Function

Input:

newdf = df.query(“sex== ‘male’”)
newdf

Output:

3) Directly Filter From DataFrame

Filter Female alive passengers using Data frame way instead of loc method.

Input:

dfalivefemale = df[(df.alive == “yes”) & (df.sex == “female”)]
dfalivefemale

Output:

4) DataFrame Filter by rows position and columns name

Let’s filter first 10 rows and only Class and Who columns

Input:

df.loc[df.index[0:10],[“pclass”,”who”]]

Output:

5) Use Lambda function to Filter Rows based on Conditions

Here I’m filtering the Rows where the Fare greater than 7.0000

Input:

df.loc[lambda df: df[‘fare’] > 7.0000]

Output:

6) Filter One Column values based on Condition in another Column

Listing out column ‘alive’ value not equal to ‘Yes’ whose town is “Southampton”

Input:

df.loc[(df.alive != “yes”) & (df.embark_town == “Southampton”)]

Output:

7) The tilde operator negates the values in the DataFrame:

Filter columns of ‘Who’ is ‘child’ using tilde operator

Input:

childpassanger= df[~df[‘who’].str.contains(‘man’)]
childpassanger

Output:

8) Filter Non-Missing Data in Pandas Data frame

Input:

nonmissingdata= df[df.deck.notnull()]
nonmissingdata

Output:

9) List Comprehension Method for Filtering

Let’s filter the rows of male who did not survive

Input:

df.iloc[[index for index,row in df.iterrows() if row[‘sex’] == ‘male’ and row[‘survived’] == 0]]

Output:

Conclusion:

I believe this article gave you lot of insights to you yet still there are lot I’ve missed such us filtering dates and time. I will cover those topics in upcoming articles.

If you enjoyed the article, Like and share it.

Follow me on LinkedIn for more insights.

--

--