Anomaly Detection v/s Supervised Learning

When should you use anomaly detection and when should you use supervised learning?

Akshita Guru
Operations Research Bit
5 min readMay 30, 2024

--

Welcome back! We’ve previously discussed supervised learning and anomaly detection. Let’s now examine the process of identifying differences.

image by https://www.makinarocks.ai

The decision is actually quite subtle in some applications. So let me share with you some thoughts and some suggestions for how to pick between these two types of algorithms.

Anomaly detection algorithm will typically be the more appropriate choice when you have a very small number of positive examples, 0–20 positive examples is not uncommon. And a relatively large number of negative examples with which to try to build a model for p of x. When you recall that the parameters for p of x are learned only from the negative examples and this much smaller. So the positive examples is only used in your cross validation set and test set for parameter tuning and for evaluation.

In contrast, if you have a larger number of positive and negative examples, then supervised learning might be more applicable. Now, even if you have only 20 positive training examples, it might be okay to apply a supervised learning algorithm. But it turns out that the way anomaly detection looks at the data set versus the way supervised learning looks at the data set are quite different.

Here is the main difference, which is that if you think there are many different types of anomalies or many different types of positive examples. Then anomaly detection might be more appropriate when there are many different ways for an aircraft engine to go wrong. And if tomorrow there may be a brand new way for an aircraft engine to have something wrong with it. Then your 20 say positive examples may not cover all of the ways that an aircraft engine could go wrong. That makes it hard for any algorithm to learn from the small set of positive examples what the anomalies, what the positive examples look like. And future anomalies may look nothing like any of the anomalous examples we’ve seen so far. If you believe this to be true for your problem, then I would gravitate to using an anomaly detection algorithm. Because what anomaly detection does is it looks at the normal examples that is the y = 0 negative examples and just try to model what they look like. And anything that deviates a lot from normal It flags as an anomaly. Including if there’s a brand new way for an aircraft engine to fail that had never been seen before in your data set.

In contrast, supervised learning has a different way of looking at the problem. When you’re applying supervised learning ideally you would hope to have enough positive examples for the average to get a sense of what the positive examples are like. And with supervised learning, we tend to assume that the future positive examples are likely to be similar to the ones in the training set.

So let me illustrate this with one example, if you are using a system to find, say financial fraud. There are many different ways unfortunately that some individuals are trying to commit financial fraud. And unfortunately there are new types of financial fraud attempts every few months or every year. And what that means is that because they keep on popping up completely new. And unique forms of financial fraud anomaly detection is often used to just look for anything that’s different, then transactions we’ve seen in the past.

In contrast, if you look at the problem of email spam detection, well, there are many different types of spam email, but even over many years. Spam emails keep on trying to sell similar things or get you to go to similar websites and so on. Spam email that you will get in the next few days is much more likely to be similar to spam emails that you have seen in the past. So that’s why supervised learning works well for spam because it’s trying to detect more of the types of spam emails that you have probably seen in the past in your training set. Whereas if you’re trying to detect brand new types of fraud that have never been seen before, then anomaly detection maybe more applicable.

Let’s go through a few more examples. We have already seen fraud detection being one use case of anomaly detection. Although supervised learning is used to find previously observed forms of fraud. And we’ve seen email spam classification typically being address using supervised learning. You’ve also seen the example of manufacturing where you may want to find new previously unseen defects. Such as if there are brand new ways for an aircraft engine to fail in the future that you still want to detect. Even if you don’t have any positive example like that in your training set. It turns out that the manufacturing supervised learning is also used to find defects. The more for finding known and previously seen defects. For example, if you are a smartphone maker, you’re making cell phones. And you know that occasionally your machine for making the case of the smartphone will accidentally scratch the cover. So scratches are a common defect on smartphones and so you can get enough training examples of scratched smartphones responding to label y =1. And just train the system to decide if a new smartphone that you just manufactured has any scratches in it. And the difference is if you just see scratched smartphones over and over and you want to check if your phones are scratched, then supervised learning works well. Whereas if you suspect that they’re going to be brand new ways for something to go wrong in the future, then anomaly detection will work well.

Some other examples, monitoring machines in the data centre, especially the machine’s been hacked. It can behave differently in a brand new way unlike any previous way in his behaviour. So that would feel more like an anomaly detection application. In fact, one theme is that many security related applications because hackers are often finding brand new ways to hack into systems. Many security related applications will use anomaly detection.

Whereas returning to supervised learning, if you want to learn to predict the weather well, there’s only a handful types of weather that you typically see. Is it sunny, rainy, is it going to snow? And so because you see the same output labels over and over, weather prediction would tend to be a supervised learning task. Or if you want to use the symptoms of the patient to see if the patient has a specific disease that you’ve seen before. Then that would also tend to be a supervised learning application.

You now know how to look for differences. I hope you enjoyed it.

References

[1] Check back soon for more posts. You can connect me on the following: Linkedin | GitHub | Medium | email : akshitaguru16@gmail.com

--

--