Learning Pandas.Series(Part-7)( Handling NaN/Missing Data in Series)
In this part-7 of learning pandas , we will explore attributes and methods of Pandas.Series
to handle NaN. If you jumped here directly, you may check Part-6
for iloc and loc comparison
→One important step in data analysis is handling Missing Data in a dataset . Missing Data is represented by NaN or None interchangeably in Pandas .Read More about None and NaN Here:
Checking if any Series has NaN : To check if the series has NaN or Missing data , we may use “hasnans
”,
import numpy as np
import pandas as pdseries2 = pd.Series([1,2,None,3,4,None],name="series_with_none")
print(series2.hasnans)series1 = pd.Series([0,2,4,3,4,8],name="series_without_nan")
print(series1.hasnans)series3 = pd.Series([1,2,np.nan,3,4,9],name="series_with_nan")
print(series3.hasnans)Output:
True
False
True
Detecting which values are Missing/NaN/Null in Series : You may use isnull() or isna() to get the boolean array set as True on NaN indexes . isnull and isna both are aliases of each other
Similarly you may use notnull() or notna()
to get existing values, they are just inverse of isnull() and isna()
import numpy as np
import pandas as pdseries2 = pd.Series([1,2,None,3,4,None],name="series_with_none")
print(series2.isna())series1 = pd.Series([0,2,4,3,4,8],name="series_without_nan")
print(series1.isnull())series3 = pd.Series([1,2,np.nan,3,4,9],name="series_with_nan")
print(series3.isna())
Now ,we know how to find NaN in a series , Let’s check how we may remove/drop the same from series .
dropna()
may be used to remove NaN from series :- dropna by default prefers immutability , In order to change in the series itself we may use param inplace=True
, Below is the example with and without inplace .
import numpy as np
import pandas as pdseries1 = pd.Series([1,2,None,3,4,None],name="series_with_none")
res_series = series1.dropna()
print(series1)
print(res_series)series2 = pd.Series([1,2,np.nan,3,4,9],name="series_with_nan")
series2.dropna(inplace=True)
print(series2)
In the above example , we have not used inplace=True for Series1 ,so it will not be changed and we have saved the result in res_series , but for Series2 changes are done the series itself , check the output in image below:
Sometimes , we don’t want to remove the NaN from the series but we want to replace all those with some computed value like mean value of the series , To replace NaN in series with some values we may use fillna()
. Similar rule of inplace param applied to fillna() as well !!
import numpy as np
import pandas as pdseries1 = pd.Series([1,2,None,3,4,None],name="series_with_none")
res_series = series1.fillna(0)
print(series1)print(res_series)series2 = pd.Series([1,2,np.nan,3,4,9],name="series_with_nan")
series2.fillna(0,inplace=True)
print(series2)
Forward-Fill and Back-Fill in Series :- There is another param method
in fillna which helps in filling previous and forward value inplace of NaN in Series .
To fill the previous value we may use method = ‘bfill’
To fill the next value we may user method = ‘ffill’
import numpy as np
import pandas as pdseries1 = pd.Series([1,2,np.nan,3,4,np.nan,5],name="series_with_nan")
series1.fillna(method='ffill',inplace=True)
print(series1)
import numpy as np
import pandas as pdseries1 = pd.Series([1,2,np.nan,3,4,np.nan,5],name="series_with_nan")
series1.fillna(method='bfill',inplace=True)
print(series1)
Problem with ‘bfill’ and ‘ffill’:- While using the same , I kind of tried a use case where first/last value in the series is NaN and we use ‘ffill/bfill’ , Let’s check the same
import numpy as np
import pandas as pdseries1 = pd.Series([np.nan,1,2,np.nan,3,4,np.nan],name="series_with_none")
series1.fillna(method='ffill',inplace=True)print(series1)series2 = pd.Series([np.nan,1,2,np.nan,3,4,np.nan],name="series_with_none")
series1.fillna(method='bfill',inplace=True)
print(series2)
In the Series above first and last value is NaN and when tried to fill with bfill and ffill, we observe the following:
You might have observed the output in both cases , first value is still NaN for ‘ffill’
and last value is still NaN for ‘bfill’
as per the definition of ffill and bfill respectively.
To Handle the above problem you may just fill the specific value at first/last position using
fillna(0,inplace=True)
after applying bfill and ffill as per the requirement !!
Thank you for reading , We are good to go with handling NaN values in Series to start with .
There is Many more to Pandas.Series , but for now i am ending this Series here and will move to learn dataframe which is actually more useful .
I will keep posting for Series if i get something interesting .
Happy Learning !!