Basics of Pandas — Part 3
In my previous article’s I addressed some of the common queries faced by a beginner while working with various datasets. This article is the continuation of my previous articles.
I’ll be continuing to demonstrate further concepts using the same dataset(UFO) as used in the first and second part of this article.
How do I change Categorical Features to Numerical Features?
The categorical Features needs to be changed to Numerical one’s to fit it into a any specific model. Although it is beneficial and convenient to use Label Encoder , pandas provide a method to change the Categorical Features to Numerical ones using get_dummies
.
pd.get_dummies(ufo,columns=['City'])
How do I apply a function to a pandas Series or DataFrame?
This can be achieved using three methods
applymap
-Apply a function to every element in a DataFrameapply
-Apply a function to each element in a Seriesmap
-Map the existing values of a Series to a different set of values
Let’s separate out the year from the given time format in the DataFrame
ufo['Time']=ufo['Time'].apply(lambda x:x.split('/')[2])
#splits string using '/' as a separator
ufo['Time']=ufo['Time'].apply(lambda x:x.split(' ')[0])
#splits string using ' ' as a separator
For the sake of demonstration of map and applymap
method I have created a new column and named it ‘New’(lack of creativity) containing values 0’s and 1’s.
ufo['Valid']=ufo.New.map({0:'No',1:'Yes'})
ufo.loc[:,'Time':'New'].applymap(float)
#apply map is only valid for DataFrame and not series object
How do I find and remove duplicate rows in pandas?
You can find out duplicate rows by attributing .duplicated()
to the whole DataFrame. One can also check the similar values of a column using the same attribute to a series object.
Logic for duplicated
:
keep='first'
(default): Mark duplicates as True except for the first occurrence.keep='last'
: Mark duplicates as True except for the last occurrence.keep=False
: Mark all duplicates as True.
ufo.duplicated().sum()
#checks the total no of rows that are identicle
ufo.drop_duplicates(keep='first',inplace=True)
#dropping duplicate entries keeping the very first of each
Certainly there remains a lot more techniques that one eventually discovers while playing with the dataset but my series of articles highlights some if not all the queries that I faced while working with Datasets. Hope you Enjoyed reading my articles
This marks the end of the series “Basics of Pandas”. Hope you enjoyed reading it. Check out the other two articles related to this series here
Thanks 😉