Replace Outlier Detection by Simple Statistics with ECOD

A NEW python-based, simple, parameter-free, and interpretable anomaly detection method

Alexandra Amidon
Geek Culture

--

Source: Wikimedia commons

Outliers can be defined as rare events in the data, often appearing at the tails of a distribution. Outlier detection is best treated as an unsupervised task because labels are often rare or difficult to collect and may not capture all possible types of anomalies that could occur.

A common heuristic method for quickly identifying outliers is the “three sigma” rule. This simple technique classifies any point located more than three standard deviations from the mean as an outlier. The “1.5 IQR” rule is another variant of this rule and is more robust to outliers.

However, this common approach only uses a limited amount of information about the data: the mean and standard deviation.

A new and better alternative is ECOD, an abbreviation of empirical cumulative distribution functions for outlier detection”. The paper was published in 2021.

It has been implemented in the PyOD python package.

It has several key features that make it stand out from competing algorithms:

  • No hyperparameters! This is important because is difficult to tune hyperparameters for outlier

--

--

Alexandra Amidon
Geek Culture

Data scientist working in the financial services industry