Analytics Vidhya
Published in

Analytics Vidhya

Random Cut Forest

The things you should know about this unsupervised machine learning algorithm.

I guess if you are looking for this algorithm then must you have started working on AWS Sagemaker. It is an anomaly detection algorithm and we can use it as a built-in algorithm with a Sagemaker.

An anomaly is an observation that diverges from otherwise standard spread data.

Sagemaker’s unsupervised built-in algorithms for

Few of the algorithms from AWS Sagemaker.

Random Cut Forest (RCF) Algorithm

RCF detects anomalous data points within a data set that diverge from otherwise well-structured or patterned data.

How Does it work

This algorithm takes a bunch of random data points cuts them into the same number of points and creates trees. If we combine all trees creates a forest of data points to determine that if a particular data point is an anomaly or not.

Example

In this image, we have these above data points in this 2D. The algorithm will give a score based on its placements. so the orange points will get more score.

Inside this circle the score for each data point will be less then the outliers. as you can see this in below image.

These 3.5 high values show that there are anomalies in the data. The score depends on the deviation itself.

To consider any data point as an outlier we can assume >3 from stand deviation could be a possible outlier.

How It calculates these scores

For this example, we can consider an easy example as there are data points on a 2D plane, and most of the data are in clusters, with one outlier that is painted as orange.

  1. The first step will create a bounding box of data, by taking the minimum and maximum values of each dimension.
  2. We select one of the dimensions and will cut randomly anywhere range through the dimension. In this example, we cut through vertically i.e x-axis.
  3. Create again bounding box for both left-hand and right-hand side.
  4. Cut it randomly at each new bounding box.
  5. Last but not least if there are any points that lie closer to the root of the tree and they will be cut and become isolated, The closer the points to the root the higher the score gets.

It will be done until every point in the tree is completely isolated.

What Now?

There you go .. you do now understand how random cut forest works. In the next blog will be taking a close look at RCF with code examples.

Thanks for reading.

If you like the article please make sure to give a clap. Please follow me for more projects and articles on my Github and my medium profile.

Don’t forget to check out the end to end deployment of a deep learning project with Android application development.

Thanks. Please comment down for any queries.

--

--

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Tapan Kumar Patro

📚 Machine learning | 🤖 Deep Learning | 👀 Computer vision | 🗣 Natural Language processing | 👂 Audio Data | 🖥 End to End Software Development | 🖌