Handbook of Anomaly Detection: with Python Outlier Detection — (2) HBOS

Chris Kuo/Dr. Dataman
Dataman in AI
Published in
16 min readOct 20, 2021

--

(Revised on August 12, 2024)

This book starts with the Histogram-Based Outlier Score (HBOS) as the leading method because it is relatively simple to implement and computationally efficient. Its efficiency and ease of use make it a valuable tool in the data scientist’s toolkit for identifying and addressing outliers.

The Histogram-Based Outlier Score (HBOS) is an unsupervised anomaly detection method. It uses histograms to model the distribution of data features and identify outliers. Imagine you have a large number of students’ test scores of several subjects and you want to find out the outstanding students. An intuitive way is to create a histogram for the test scores of each subject. Those that fall in the high score bins of the histogram are the outstanding students on that subject. You can produce the histogram for the scores of each subject. HBOS models this intuition. It calculates an outlier score for the students on that subject based on the histogram. Those who fall into bins with low frequencies will receive higher outlier scores by HBOS. The high outlier score indicates the student is an outlier on that subject. Since there are multiple subjects, HBOS creates multiple histograms and calculates the outlier scores for each subject. Finally, HBOS sums up the outlier scores from each…

--

--