Removing outliers from data using Python and Pandas

Graham Harrison
Analytics Vidhya
Published in
6 min readOct 17, 2020

--

Outliers

A boxplot showing the median and inter-quartile ranges is a good way to visualise a distribution, especially when the data contains outliers. The meaning of the various aspects of a box plot can be explained as follows -

Generating some data

We are going to need some test data to explore the issues around outliers …

Function Definition

The generate() function below (taken from Stack Overflow) will generate a list of floats with a given median that contains outliers (values a long way from the median) which we can use to explore the concept.

The generate() function was modified from https://stackoverflow.com/questions/55351782/how-should-i-generate-outliers-randomly

Function Testing

Let’s get the results of generate() into a DataFrame so we can take a look at the output …

--

--

Graham Harrison
Analytics Vidhya

Click here to subscribe to Medium - https://grahamharrison-86487.medium.com/membership - you will get full access to 1000's of articles and support the writers!