Removing outliers from data using Python and Pandas
Outliers
A boxplot showing the median and inter-quartile ranges is a good way to visualise a distribution, especially when the data contains outliers. The meaning of the various aspects of a box plot can be explained as follows -
Generating some data
We are going to need some test data to explore the issues around outliers …
Function Definition
The generate() function below (taken from Stack Overflow) will generate a list of floats with a given median that contains outliers (values a long way from the median) which we can use to explore the concept.
The generate() function was modified from https://stackoverflow.com/questions/55351782/how-should-i-generate-outliers-randomly
Function Testing
Let’s get the results of generate() into a DataFrame so we can take a look at the output …