Random Cut Forest
The things you should know about this unsupervised machine learning algorithm.
I guess if you are looking for this algorithm then must you have started working on AWS Sagemaker. It is an anomaly detection algorithm and we can use it as a built-in algorithm with a Sagemaker.
An anomaly is an observation that diverges from otherwise standard spread data.
Sagemaker’s unsupervised built-in algorithms for
Few of the algorithms from AWS Sagemaker.
- K-Means Algorithm
- Principal Component Analysis (PCA) Algorithm
- Random Cut Forest (RCF) Algorithm
- IP Insights
Random Cut Forest (RCF) Algorithm
RCF detects anomalous data points within a data set that diverge from otherwise well-structured or patterned data.
How Does it work
This algorithm takes a bunch of random data points cuts them into the same number of points and creates trees. If we combine all trees creates a forest of data points to determine that if a particular data point is an anomaly or not.
In this image, we have these above data points in this 2D. The algorithm will give a score based on its placements. so the orange points will get more score.
Inside this circle the score for each data point will be less then the outliers. as you can see this in below image.
These 3.5 high values show that there are anomalies in the data. The score depends on the deviation itself.
To consider any data point as an outlier we can assume >3 from stand deviation could be a possible outlier.
How It calculates these scores
For this example, we can consider an easy example as there are data points on a 2D plane, and most of the data are in clusters, with one outlier that is painted as orange.
- The first step will create a bounding box of data, by taking the minimum and maximum values of each dimension.
- We select one of the dimensions and will cut randomly anywhere range through the dimension. In this example, we cut through vertically i.e x-axis.
- Create again bounding box for both left-hand and right-hand side.
- Cut it randomly at each new bounding box.
- Last but not least if there are any points that lie closer to the root of the tree and they will be cut and become isolated, The closer the points to the root the higher the score gets.
It will be done until every point in the tree is completely isolated.
There you go .. you do now understand how random cut forest works. In the next blog will be taking a close look at RCF with code examples.
Thanks for reading.
If you like the article please make sure to give a clap. Please follow me for more projects and articles on my Github and my medium profile.
tapanKumarPatro - Overview
Arctic Code Vault Contributor Forked from greensdata/10-steps-to-become-a-data-scientist 📢 Ready to learn or review…
Tapan Kumar Patro - Medium
This is as simple as predicting the future just like an astrologer but with historical data. Hey guys Tapan here. In…
Don’t forget to check out the end to end deployment of a deep learning project with Android application development.
Thanks. Please comment down for any queries.