Basic Data Operations — Part 2

The most basic data operations which we use in day to day basis for data wrangling, in comparison with SQL query for the same operation. This article mainly focuses on condition based filtering in Python. For more you could refer python documentation. Enjoy reading!

Let us consider image given below as reference dataset. Referring this as dataset1

Filtering data based on single column condition

SQL Query:

Select * from dataset1 where Country IN ('Canada')

OR

Select * from dataset1 where Country = 'Canada'

In SQL, for exact value match use = sign and IN operator. For partial value match use like function.

SQL Output:

sql output

Python Code [Approach -1]:

Approach 1 code will use isin function to filter out the data. We have used two steps, firstly create boolean variable satisfying the filtering condition then use boolean variable to filter rows

For more information, refer this (Source)

country_list = ['Canada']

dataset1[dataset1['Country'].isin(country_list)]

Python Code [Approach -2]:

Approach 2 code will use loc and isin function to filter out the data.

For more information, refer this (Source)

country_list = ['Canada']

dataset1.loc[dataset1['Country'].isin(country_list)]

In Python, for exact value match use == sign and isin function. For partial value match use functions like startswith , endswith and other funcitons.

Python Output:

python output

Filtering data based on multiple column conditions

For multiple column based conditions, separate the conditions with AND/OR based on requirement in both SQL and Python. For instance, we need to find all rows in dataset1 that belongs to country Cananda and product name ends with nana

SQL Query Example:

Select * from dataset1 where Country IN ('Canada') AND Product like '%nana'

Python Code Example using Approach 1:

country_list = ['Canada']

dataset1[dataset1['Country'].isin(country_list) and dataset1['Product'].str.endswith('nana')]

Python Code Example using Approach 2:

country_list = ['Canada']

dataset1.loc[dataset1['Country'].isin(country_list) and dataset1['Product'].str.endswith('nana')]

Output:

Thanks for reading!

Basic Data Operations — Part 3, Coming Soon.

Stay Connected 😀

--

--

Smriti Changulani
Python Data Wrangling by Smriti Changulani

Past — Software Engineer. Present — Data Analyst. Future — Data Scientist