How I Deal with Missing or Outlier Data with Numpy and Pandas in Python

Elfao
GlassBox

--

Missing values and outliers are frequently encountered when dealing with data. So the big question in this kind of case is how to treat these missing or outliers values?

In this article, we will present you with some methods to identify and treat missing values as well as outliers.

For this article, we will use this data: https://www.kaggle.com/gregorut/videogamesales

Univariate analysis

To start any data project, you need to know the data you have. Are there missing values? What about Outliers? Some variables have too many values and will introduce instability on your model? …

So univariate analysis can answer all these questions and allow you to better understand your data in order to understand the adjustments necessary to make your project a success.

I. The fastest method for univariate analysis

Instead of ranting on about the most burdensome portion of this excercise consider instead this neat way of getting right to the heart of the pandas and numpy proess.

import pandas as pd
import numpy as np
from pandas_profiling import ProfileReport
###################################################################…

--

--

Elfao
GlassBox

Data scientist with 4 years experience. I worked in different field like Marketing digital, Consulting and currently I work for a start-up in finance.