Basics of Pandas — Part 3

Published in

Analytics Vidhya

3 min readDec 30, 2020

In my previous article’s I addressed some of the common queries faced by a beginner while working with various datasets. This article is the continuation of my previous articles.

I’ll be continuing to demonstrate further concepts using the same dataset(UFO) as used in the first and second part of this article.

How do I change Categorical Features to Numerical Features?

The categorical Features needs to be changed to Numerical one’s to fit it into a any specific model. Although it is beneficial and convenient to use Label Encoder , pandas provide a method to change the Categorical Features to Numerical ones using get_dummies.

pd.get_dummies(ufo,columns=['City'])

Before implementing get_dummies function

After implementing get_dummies function or method.

How do I apply a function to a pandas Series or DataFrame?

This can be achieved using three methods

applymap-Apply a function to every element in a DataFrame
apply-Apply a function to each element in a Series
map-Map the existing values of a Series to a different set of values

Let’s separate out the year from the given time format in the DataFrame

ufo['Time']=ufo['Time'].apply(lambda x:x.split('/')[2])
#splits string using '/' as a separator 
ufo['Time']=ufo['Time'].apply(lambda x:x.split(' ')[0])
#splits string using ' ' as a separator

For the sake of demonstration of map and applymap method I have created a new column and named it ‘New’(lack of creativity) containing values 0’s and 1’s.

ufo['Valid']=ufo.New.map({0:'No',1:'Yes'})

ufo.loc[:,'Time':'New'].applymap(float)
#apply map is only valid for DataFrame and not series object

How do I find and remove duplicate rows in pandas?

You can find out duplicate rows by attributing .duplicated() to the whole DataFrame. One can also check the similar values of a column using the same attribute to a series object.

Logic for duplicated:

keep='first' (default): Mark duplicates as True except for the first occurrence.
keep='last': Mark duplicates as True except for the last occurrence.
keep=False: Mark all duplicates as True.

ufo.duplicated().sum()
#checks the total no of rows that are identicle 
ufo.drop_duplicates(keep='first',inplace=True)
#dropping duplicate entries keeping the very first of each

Certainly there remains a lot more techniques that one eventually discovers while playing with the dataset but my series of articles highlights some if not all the queries that I faced while working with Datasets. Hope you Enjoyed reading my articles

This marks the end of the series “Basics of Pandas”. Hope you enjoyed reading it. Check out the other two articles related to this series here

Basics of Pandas — Part 1

Basics of Pandas — Part 2

Thanks 😉

Basics of Pandas — Part 3

How do I change Categorical Features to Numerical Features?

How do I apply a function to a pandas Series or DataFrame?

How do I find and remove duplicate rows in pandas?

Written by Aarish Alam