What’s the difference between Null and NaN in Python?
Missing data includes None
, NaN
. When we are dealing with missing values using Pandas, we don’t need to differentiate them because Pandas use NaN
internally for simplicity. However, it’s better to have a deeper understanding of it.
NaN: Not a Number
NaN
is a missing floating-point value, a special value that is part of the IEEE floating-point specification.
In Numpy, the array with a NaN
value is a native floating-point type array.
However, when you try to do some arithmetic operations will NaN
, the result will always be NaN
.
Fortunately, Numpy provides some special aggregation methods that can ignore the existence of NaN value.
None: A Python Object
None
is a Python Object called NoneType.
If you try to aggregate over this array, you will get an error because of the NoneType.
Pandas is built to handle the two of them nearly interchangeably, converting between them where appropriate.
Pandas automatically converts the None
to a NaN
value.
In addition, according to the documentation of Pandas, the nan's
don’t compare equal, but None's
do. Note that pandas/NumPy uses the fact that np.nan != np.nan
, and treats None
like np.nan
.
In [11]: None == None # noqa: E711
Out[11]: True
In [12]: np.nan == np.nan
Out[12]: False
Reference:
[1] https://jakevdp.github.io/PythonDataScienceHandbook/03.04-missing-values.html
[2] https://pandas.pydata.org/pandas-docs/stable/user_guide/missing_data.html