Harnessing Deep Learning and low fidelity security insights to detect advanced persistent threats (Part 1 of 2): Understanding the fundamentals

Published in

Data Science at Microsoft

9 min readNov 9, 2021

By Yasmin Bokobza and Jonatan Zukerman

Identifying threats inside your organization and their potential impact — whether a compromised entity or a malicious insider — has always been a time-consuming and labor-intensive process. Sifting through alerts, connecting the dots, and active hunting all add up to significant time and effort expended with often minimal returns, and with the possibility of sophisticated threats simply evading discovery. Particularly elusive threats such as zero-day, targeted, and advanced persistent threats can be among the most consequential, making their detection even more critical. In recent years, User and Entity Behavior analytics (UEBA) has become a prominent solution for addressing these challenges. UEBA is a key component of Microsoft’s Security Information and Event Management (SIEM) offering as part of Azure Sentinel.

The goal of some of our recent work has been to detect persistent threats for Azure Sentinel by using low fidelity indications of possible threats. These indications are calculated in real time by the UEBA engine and are used by our anomaly detection engine. Because these indications are weak signals of attacks, we decided to find the combination of these signals that are rare. Specifically, we treat this challenge as a conventional unsupervised multivariate anomaly detection task.

In this first article of a two-part series about harnessing Deep Learning and low fidelity security insights to detect advanced persistent threats, we discuss UEBA for Azure Sentinel to describe the problem of detecting advanced persistent threats and how we formulate it as multivariate anomaly detection. We then walk through different approaches for anomaly detection with an emphasis on what led us to develop our approach of fusing security research alongside Deep Learning to detect persistent threat in our use cases. In addition, we provide guidance for you to consider in tackling your own business problems. In the forthcoming Part 2 article, we introduce our approach to detect compromised accounts by combining attack indications based on cyber security expertise, known attack vectors, and Deep Learning as part of sharing the validation for our business scenario.

What is Azure Sentinel?

As mentioned earlier, Azure Sentinel is Microsoft’s SIEM solution. SIEM is designed to support threat detection, compliance, and security incident management through the collection and analysis (in both near real-time and historical time) of security events, as well as a wide variety of other event and contextual data sources. The core capabilities encompass a broad scope of log event collection and management and the ability to analyze log events and other data across disparate sources, combined with operational capabilities (such as incident management, dashboards, and reporting).

What is UEBA?

Gartner defines UEBA as a “solution that uses analytics to build standard profiles of user behaviors and entities across a time and peer group horizon. Activity that is anomalous to these standard baselines is presented as suspicious, and packaged analytics applied to these anomalies can help in the discovery of threats and potential incidents.” (1)

UEBA solutions build baselines for user and entity profiles to identify typical activity. These solutions also leverage Machine Learning (ML) for descriptive and predictive models. ML is an important component of UEBA as it automatically builds models, learns from historical data, and identifies deviations from typical behavior.

Through ML, UEBA can help provide an understanding of how users (including humans and service accounts) and entities (machines) typically behave within a given environment. This addresses a challenge with legacy SIEMs having static correlation rules that can generate many false positives and are single-dimensional.

Azure Sentinel UEBA provides the following:

Investigation and hunting with contextual and behavioral information.
Entity pages that provide clear insight, timelines, and investigation prioritization.
Instant security value following quick and simple onboarding.

Figure 2 shows the UI for behavioral information and investigation prioritization. Input for the UEBA engine consists of different activities per account, such as account logon. The output of the engine consists of enriched activities with indications of attacks. These indications are shown in different tables and are used by customers to identify specific abnormal activity their security researchers are seeking. For example, if they suspect malicious traffic coming from a certain country, they can filter entities that have been active from that country, and whether that is their typical location or whether they have not been active from that location. In our work, we have aimed to find anomalies based on these indications without looking for a specific scenario using Deep Learning.

One of the key features of UEBA is anomaly detection based on entity behavior profiling. The anomaly detection feature combines the indications generated by the UEBA engine into an anomalous indication that the anomaly engine detects. In this way, it essentially combines indications of malicious activity into a stronger signal. Our customers can choose to enable detection of anomalies based on their specific needs, such as by type of anomaly. In this article we suggest an approach that enables detecting anomalies based on activities rather than on specific use cases. Figure 3 shows the anomalies UI that customers can use to enable specific types of anomaly detection.

Anomaly detection approaches

As mentioned, by using UEBA we can formulate the problem of detecting advanced persistent threats as one of detecting multivariate anomalies. The choice of anomaly detection approach depends on the type of problem that is defined based on the training and test data. There are three main approaches:

Supervised detection is used when the training and test data are labeled. Using this approach, data labeling (normal/anomaly) is performed based on the assumption that anomalies are labeled.
Semi-supervised detection is used when the training data is devoid of anomalies and the test data is unlabeled. Using this approach, data labeling (normal/anomaly) is performed based on the assumption that anomalies will be detected once a deviation from the learned normalcy values have occurred.
Unsupervised detection is used when the training and test data are unlabeled. Using this approach, data labeling (normal/anomaly) is performed based on the observations scores using their characteristics and without any predetermined normalcy values.

In our use case, we used an unsupervised anomaly detection approach because our goal was to detect anomalies based on unlabeled activities of thousands of customers.

Anomaly detection with ML

Because cyberattacks are getting more sophisticated and relying on security experts alone to counter them may not be sufficient, it’s important to defend against hitherto unknown attacks and zero-day exploits. ML models have proven highly helpful for improving anomaly detection accuracy and assisting organizations that are already managing big data. The ability of ML models to handle unlabeled and unstructured data — while being more sensitive to distinguishing data anomalies from noise — allows them to determine what is normal and what may be regarded as a data anomaly more accurately.

Because getting labeled data of TP (true positive) attacks/anomalies is difficult (for various reasons), a useful alternative is to leverage unsupervised models to detect anomalies. Several widely used approaches exist for unsupervised anomaly detection, and some of them can be used for detecting anomalies in high dimensional space (2). Figure 4 lists a portion of unsupervised anomaly detection algorithms. Algorithms that are distance based, like k-nearest neighbors or clustering algorithms, might be useful for explaining the anomaly (feature importance) but can have difficulty with linearly correlated features. Subspace models that use dimensional reduction might be a good option for cases of anomaly detection where the input features might be correlated. In addition, reconstruction can be used as an anomaly score and the relative reconstruction error of each feature can be used for an explanation layer. In this case, however, nonlinear relations between inputs might be poorly handled.

Autoencoders are a well-known approach to detect outliers. The early application of autoencoders results in dimensionality reduction. However, Autoencoder techniques outperform dimensionality reduction methods when data problems are complex and non-linear. In our use case, we are dealing with high dimensional input data and highly unbalanced feature distribution. In addition, we need a model that is explainable. Our work with these various approaches has helped us create a specialized solution for our own use case, enabling us to use the Autoencoder reconstruction error to explain the anomalies.

By leveraging the security expertise of Microsoft’s expert security research and known attack vectors, features that are indictive to anomalous activity can be generated.

The features generated by the security expertise are then used by the Autoencoder models to provide a security-embedded ML anomaly model mapped to a MITRE framework.

Figure 4: Unsupervised anomaly detection algorithms (3).

Anomaly detection using Autoencoders

Autoencoders are widely used in anomaly detection, learning from training data without the need for explicit labels to train on. They attempt to recreate the input at the output. Figure 5 presents the Autoencoders architecture. First, the Encoder network translates the input data to latent low-dimensional data and the hidden layer learns the latent representation. The smaller the size of the bottleneck hidden layer, the greater the compression of the data. Then the Decoder network uses the hidden layer output to reconstruct the input data with minimum error. The loss function evaluates how well the decoder reconstructs the input data (in other words, it calculates the reconstruction error). Anomalous observations are more difficult to recreate and have a higher reconstruction error. Therefore, the reconstruction errors are used as anomaly scores and a predefined threshold for the reconstruction error is used to detect an anomaly.

Usually, the layers structure of the Decoder is symmetric to the Encoder, while the number of nodes per layer decreases with each subsequent layer of the encoder and increases with each subsequent layer of decoder. The number of layers in each network, the number of nodes, and the type of activation function in each layer can be set by using a robust test harness and controlled experiments.

List of Python packages for anomaly detection

Multiple Python packages implement various statistical methods and algorithms within the anomaly detection framework. Below is a summary of a few popular open source Python packages for performing anomaly detection. The table below summarizes some popularly supported models or methods by packages such as PYOD, which covers a vast majority of algorithms, and Pycaret, which also supports scenarios from data preparation to model deployment. The following table represents a snapshot from October 2021 that is expected to evolve over time.

We advise you to consider the recency and maintenance activity of the packages when choosing the most appropriate one for a given problem. Most of the packages are flexible and reusable for various contexts and scenarios, and data scientists will find them easy to pick up even with limited or no background in anomaly detection.

Conclusions

In this article, we’ve discussed the use of UEBA in some Microsoft offerings and have presented various anomaly detection methods, including Autoencoders. In addition, we have summarized a few popular Python anomaly detection packages that represent a good starting point. Many packages are handy for those new to anomaly detection. In the next article in this series, we will discuss the methodology of our approach for detecting compromised accounts and share the results and validation for our business scenario. We hope this series provides you with guidelines to help you address your own business problems.

We’d like to thank Microsoft Advanced AI school & Microsoft Research, especially James McCaffrey and Patrice Godefroid, for being great partners in the research design of this work. We also would like to thank Itay Argoety and Casey Doyle for helping review the work.

References

https://www.gartner.com/reviews/market/user-and-entity-behavior-analytics
A Comparative Evaluation of Unsupervised Anomaly Detection Algorithms for Multivariate Data Markus Goldstein, by Seiichi Uchida
https://journals.plos.org/plosone/articleid=10.1371/journal.pone.0152173