Image for post
Image for post

Anomaly Detection with MIDAS

How can we detect anomalies more accurately and faster?

Nunzio Logallo
Mar 11 · 4 min read

Anomaly detection in graphs is a severe problem finding strange behaviors in systems, like intrusion detection, fake ratings, and financial fraud. To minimize the effect of malicious activities as soon as possible, we need to detect anomalies in real-time to identify an incoming edge and decide if it is anomalous or not. Existing methods, process edge streams in an online manner and can miss a large amount of suspicious activity; in contrast to this, MIDAS detects microclusters anomalies in edge streams using constant time and memory, providing theoretical bounds on the false positive probability.

MIDAS is a project made by Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin, and Christos Faloutsos.

Main MIDAS contributions are:
1. Streaming Microcluster Detection, novel streaming approach for detecting microcluster anomalies;
2. Theoretical Guarantee, on the false positive probability of MIDAS;
3. Effectiveness, MIDAS’ experimental results show that MIDAS outperforms the baseline approaches by 42%-48% accuracy and processes the data 162–644 times faster.

If we compare MIDAS to previous approaches that detect anomalies in edge streams, we see that MIDAS includes more features like Microcluster Detection and Guarantee on false-positive probability, keeping the other elements of other approaches.

Image for post
Image for post
Comparison of relevant edge stream anomaly detection approaches — Source: the MIDAS repository

Algorithm

There are two approaches proposed: MIDAS and MIDAS-R.
Here is an overview:

  1. Streaming Hypothesis Testing Approach, is MIDAS’ work, where we can obtain guarantees on false positive probability using streaming data structures in a hypothesis testing-based framework;
  2. Detection and Guarantees, we make a decision on the procedure for determinating if a point is abnormal or not, obtaining guarantees on false-positive probability;
  3. Incorporating Relations, where MIDAS-R comes into play, incorporating relationships between edges temporally and spatially.

If you want to learn more about the algorithm, please visit the MIDAS repository.

Accuracy

Image for post
Image for post
ROC for DARPA dataset — Source: the MIDAS repository

In the graph above, which plots the ROC curve for MIDAS, MIDAS-R, and SedanSpot (a consistent anomaly detection approach), we can see that MIDAS is 42% more accurate compared to the baseline, and also run significantly faster (644×).

Image for post
Image for post
Average Precision Score vs. running time of MIDAS and MIDAS-R — Source: the MIDAS repository

In the graph above, which plots the average precision score vs. the running time, we see that MIDAS is 27% more precise compared to the baseline. In comparison, MIDAS-R is 29% more precise, achieving the highest average precision score. We can say that both MIDAS and MIDAS-R outperform other anomaly detection approaches in edge streams.

Scalability

Image for post
Image for post
Scalability of MIDAS and MIDAS-R compared to the number of edges — Source: the MIDAS repository

The graph above shows the scalability of MIDAS and MIDAS-R. As we can see, it confirms the scalability of them compared to the processing time per edge with an increase in the number of edges. Both MIDAS and MIDAS-R, allow real-time anomaly detection, processing 4M edges within 0.5s.

Real-World Effectiveness

Image for post
Image for post
Correspondance between detected anomalies by MIDAS and major security-related events in TwitterSecurity — Source: the MIDAS repository

One last time we compare MIDAS, MIDAS-R, and SedanSpot measuring their anomaly scores in a real-world example: TwitterSecurity dataset. The graph above plots anomaly scores vs. day, from May to September 2014. As we can see, we have different peaks of anomalies that coincide with significant events in the TwitterSecurity timeline for MIDAS. In contrast, SedanSpot simply outputs a lot of high anomalousness scores, thereby leading to low AUC.

Other use cases

Let’s think about one application of MIDAS in the manufacturing sector, where there are a lot of working machines interconnected as a graph; if these machines have strange behavior, it can result in an overrun of costs in terms of power consumption and raw materials waste. An anomaly detection algorithm like MIDAS is capable of detecting these strange behaviors in real-time, reducing, and preventing a loss. There are many more applications of MIDAS, for example, as a detector of fake accounts for social networks like Twitter and Facebook, where there are people or bots who create false identities. MIDAS can help in detecting fake news too, deciding whether an article is real or it is just a clickbait.

Conclusion and References

MIDAS and MIDAS-R make the detection of anomalies in edge streams, faster and more accurate, keeping high scalability and real-world effectiveness.

If you want to learn more about MIDAS, check the MIDAS repository where you can find examples and a getting started guide. If you have any questions, please don’t hesitate to contact Siddharth Bhatia.

Siddharth Bhatia, Bryan Hooi, Minji Yoon, Kijung Shin and Christos Faloutsos. “MIDAS: Microcluster-Based Detector of Anomalies in Edge Streams.” AAAI Conference on Artificial Intelligence (AAAI), 2020. https://arxiv.org/abs/1911.04464

Nunzio Logallo

Towards AI — Multidisciplinary Science Journal

The Best of Tech, Science, and Engineering.

Sign up for Towards AI Newsletter

By Towards AI — Multidisciplinary Science Journal

Towards AI publishes the best of tech, science, and engineering. Subscribe with us to receive our newsletter right on your inbox. For sponsorship opportunities, please email us at pub@towardsai.net Take a look

By signing up, you will create a Medium account if you don’t already have one. Review our Privacy Policy for more information about our privacy practices.

Check your inbox
Medium sent you an email at to complete your subscription.

Thanks to Siddharth Bhatia

Nunzio Logallo

Written by

Polimi Computer Engineering student. Aspiring Machine Learning Engineer.

Towards AI — Multidisciplinary Science Journal

Towards AI is a world’s leading multidisciplinary science publication. Towards AI publishes the best of tech, science, and engineering. Read by thought-leaders and decision-makers around the world.

Nunzio Logallo

Written by

Polimi Computer Engineering student. Aspiring Machine Learning Engineer.

Towards AI — Multidisciplinary Science Journal

Towards AI is a world’s leading multidisciplinary science publication. Towards AI publishes the best of tech, science, and engineering. Read by thought-leaders and decision-makers around the world.

Medium is an open platform where 170 million readers come to find insightful and dynamic thinking. Here, expert and undiscovered voices alike dive into the heart of any topic and bring new ideas to the surface. Learn more

Follow the writers, publications, and topics that matter to you, and you’ll see them on your homepage and in your inbox. Explore

If you have a story to tell, knowledge to share, or a perspective to offer — welcome home. It’s easy and free to post your thinking on any topic. Write on Medium

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store