Learning Pandas.Series(Part-7)( Handling NaN/Missing Data in Series)

Milankmr
Analytics Vidhya
Published in
4 min readJun 7, 2020
Photo by Kevin Ku on Unsplash

In this part-7 of learning pandas , we will explore attributes and methods of Pandas.Series to handle NaN. If you jumped here directly, you may check Part-6 for iloc and loc comparison

→One important step in data analysis is handling Missing Data in a dataset . Missing Data is represented by NaN or None interchangeably in Pandas .Read More about None and NaN Here:

Checking if any Series has NaN : To check if the series has NaN or Missing data , we may use “hasnans”,

import numpy as np
import pandas as pd
series2 = pd.Series([1,2,None,3,4,None],name="series_with_none")
print(series2.hasnans)
series1 = pd.Series([0,2,4,3,4,8],name="series_without_nan")
print(series1.hasnans)
series3 = pd.Series([1,2,np.nan,3,4,9],name="series_with_nan")
print(series3.hasnans)
Output:
True
False
True
hasnans in pandas.series

Detecting which values are Missing/NaN/Null in Series : You may use isnull() or isna() to get the boolean array set as True on NaN indexes . isnull and isna both are aliases of each other

Similarly you may use notnull() or notna() to get existing values, they are just inverse of isnull() and isna()

import numpy as np
import pandas as pd
series2 = pd.Series([1,2,None,3,4,None],name="series_with_none")
print(series2.isna())
series1 = pd.Series([0,2,4,3,4,8],name="series_without_nan")
print(series1.isnull())
series3 = pd.Series([1,2,np.nan,3,4,9],name="series_with_nan")
print(series3.isna())
Detecting NaN and Missing data in Pandas.Series

Now ,we know how to find NaN in a series , Let’s check how we may remove/drop the same from series .

dropna() may be used to remove NaN from series :- dropna by default prefers immutability , In order to change in the series itself we may use param inplace=True , Below is the example with and without inplace .

import numpy as np
import pandas as pd
series1 = pd.Series([1,2,None,3,4,None],name="series_with_none")
res_series = series1.dropna()
print(series1)
print(res_series)
series2 = pd.Series([1,2,np.nan,3,4,9],name="series_with_nan")
series2.dropna(inplace=True)
print(series2)

In the above example , we have not used inplace=True for Series1 ,so it will not be changed and we have saved the result in res_series , but for Series2 changes are done the series itself , check the output in image below:

series.dropna with and without inplace param

Sometimes , we don’t want to remove the NaN from the series but we want to replace all those with some computed value like mean value of the series , To replace NaN in series with some values we may use fillna(). Similar rule of inplace param applied to fillna() as well !!

import numpy as np
import pandas as pd
series1 = pd.Series([1,2,None,3,4,None],name="series_with_none")
res_series = series1.fillna(0)
print(series1)
print(res_series)series2 = pd.Series([1,2,np.nan,3,4,9],name="series_with_nan")
series2.fillna(0,inplace=True)
print(series2)
Fillna with and without inplace param

Forward-Fill and Back-Fill in Series :- There is another param methodin fillna which helps in filling previous and forward value inplace of NaN in Series .

To fill the previous value we may use method = ‘bfill’

To fill the next value we may user method = ‘ffill’

import numpy as np
import pandas as pd
series1 = pd.Series([1,2,np.nan,3,4,np.nan,5],name="series_with_nan")
series1.fillna(method='ffill',inplace=True)
print(series1)
Forward-Fill in Pandas.Series
import numpy as np
import pandas as pd
series1 = pd.Series([1,2,np.nan,3,4,np.nan,5],name="series_with_nan")
series1.fillna(method='bfill',inplace=True)
print(series1)
Backward fill in Pandas.Series

Problem with ‘bfill’ and ‘ffill’:- While using the same , I kind of tried a use case where first/last value in the series is NaN and we use ‘ffill/bfill’ , Let’s check the same

import numpy as np
import pandas as pd
series1 = pd.Series([np.nan,1,2,np.nan,3,4,np.nan],name="series_with_none")
series1.fillna(method='ffill',inplace=True)
print(series1)series2 = pd.Series([np.nan,1,2,np.nan,3,4,np.nan],name="series_with_none")
series1.fillna(method='bfill',inplace=True)
print(series2)

In the Series above first and last value is NaN and when tried to fill with bfill and ffill, we observe the following:

Issue with ffill and bfill

You might have observed the output in both cases , first value is still NaN for ‘ffill’ and last value is still NaN for ‘bfill’ as per the definition of ffill and bfill respectively.

To Handle the above problem you may just fill the specific value at first/last position using fillna(0,inplace=True) after applying bfill and ffill as per the requirement !!

Thank you for reading , We are good to go with handling NaN values in Series to start with .

There is Many more to Pandas.Series , but for now i am ending this Series here and will move to learn dataframe which is actually more useful .

I will keep posting for Series if i get something interesting .

Happy Learning !!

Photo by Nathan Dumlao on Unsplash

--

--

Milankmr
Analytics Vidhya

Working in MNC and trying to share my learning along the way to fellow learners :)