On Finding Interesting Anomalies in UEBA Systems

Published in

AI/ML at Symantec

8 min readMar 28, 2019

What is UEBA?

User and entity behavioral analytics (UEBA) is a broad umbrella term for many user and entity focused technologies. UEBA is a burgeoning area for numerous industry applications; yet its definition has been a challenge and is the focus of significant blogging:

As researchers and practitioners attempt to define it more clearly, we can see that UEBA is inherently interdisciplinary. It draws on many key areas and technologies: machine learning, networking, statistics, human computer interaction, and visualization. It also encompasses a host of approaches, systems, and technologies, for example in the security industry:

CASB — cloud access security broker systems enforce policies (and provide data and threat protection) between a cloud service customer and their cloud solutions provider;
DLP — data loss prevention systems inspect data (both at rest and in use) to apply corporate policies and subsequent actions (log, report, encrypt etc.) to that data;
DCAP — data-centric audit and protection systems work similarly to DLP by safeguarding specific private data through data governance and monitoring;
EDR — endpoint detection and response techniques focus on detection, inspection, and mitigation of users’ endpoints.

What is largely agreed upon is the emphasis on producing analytics for tracking the malicious or risky behaviors of users in the context of other users and entities.

UEBA tracking users and entities to detect malicious internal behavior.

A major challenge and misunderstood aspect of many UEBA systems, is that they inherit many serious challenges from their application of anomaly detection. In this article I will elucidate some key challenges and questions for both researchers and practitioners building UEBA systems.

UEBA Challenges

Anomaly detection is the identification of unusual or rare data instances (observations, items, events, samples, cases, records, etc.) when compared to the majority of the data. It is common practice to use the term “normal” to identify the majority of the data and “anomalous” or “outliers” for data outside the normal region. Anomaly and outlier detection is a well-established field, with a plethora of approaches, systems, and techniques being published each year in a wide variety of domains. I will cover just a few that pertain strongly to the security and more specifically UEBA areas. There are many challenges to using anomaly detection for UEBA:

1. We want interesting anomalies, but that depends on an analyst or end-user to determine “interestingness”. Clearly this may vary highly across individuals or groups.

2. Constructing normal boundaries is hard: Defining a normal region, often in a geometrically complex space, is extremely challenging. For example, many approaches require thresholds or hyper-parameters which radically change what is considered an outlier.

3. Real data are noisy. What am I measuring? Are my anomalies noise in the measurement apparatus, or the result of low event frequencies? Often suspicious and malicious behavior is infrequent, but so too are many non-malicious behaviors of real-world systems.

4. The distribution of anomalies and normal behavior are often dynamic

5. What works in one domain often fails to transfer. Cross-domain anomaly detection methods are challenging (e.g., a rapid decrease may not be an anomaly for a stock price, but could be rare and deadly for blood pressure)

As a critical distinction, some UEBA approaches are actually trying to perform novelty detection. Novelty detection is similar to anomaly detection but is instead focused on finding motifs and patterns previously unobserved in a dataset. This research is especially useful for detecting novel data or signals that haven’t yet been seen by a trained, supervised machine learning system. Markou and Singh have great coverage of statistically sound approaches for this [Markou & Singh, 2017].

Subjective User Interest Driving Anomaly Value

UEBA problems often start with someone saying, “I want to find some interesting and uncommon behaviors for a user.” This statement typically flows into a host of semi-supervised and unsupervised methods for anomaly detection. For example, detecting fraud, cyber-intrusions, and fake reviews all share an interesting aspect to an analyst, but precisely what that is varies case-by-case. The type of anomaly itself can vary from problem to problem. Chandola et al. taxonomize anomalies into three general types [Chandola et al., 2009]:

Point Anomalies — when an individual data instance is view as anomalous with respect to the other data. A significant amount of prior research has focused on methods for detecting these.

Contextual Anomalies — when a data instance is anomalous in a very specific context, but not anomalous otherwise. Sometimes called conditional anomalies, these require a neighborhood of other data instances about which the behavior of the data instance in question maybe be anomalous. This can be spatially co-located points, historical points in a time series, or prior behaviors of a user.

Collective Anomalies — when multiple data instances are considered anomalous when compared with the whole of the data. While research has focused on many areas (e.g., time-varying, spatial, graph), sequences are often of the greatest importance to UEBA. Often many actions are needed to constitute an attack.

Unfortunately, most black-box anomaly detectors do a terrible job aligning the anomaly function (the means of selecting a normal region) with the subjective user interest function. The language of your anomaly detector is set by the input feature-space; if you don’t have meaningful inputs, the interpretation and “explainability” of any detected anomalies loses its traction. For example, consider using a one-class-SVM (a fairly popular out-of-the-box method) for finding behavioral anomalies when looking at user event data. The SVM provides you with a ranking of the geometric behavior of your event data instances, in which case an anomaly may not be interesting with respect to any malicious behavior at all (it could be a benign low frequency action or noisy data collection). Capturing the event input data at the right granularity is critical in finding suspicious behaviors, many of which occur as an aggregate of very low-level events.

An additional critical consideration is what is being output to the UEBA system or analyst. Is the goal to label a particular data instance (anomalous vs normal) or to rank (score how anomalous each instance is). If you don’t have labeled data for training, producing meaningful labels will be extremely challenging.

Every system requires validation, in the case of supervised learning a large corpus of work exists on measuring model performance. However, unsupervised anomaly detection systems are notoriously hard to evaluate. Luckily in the UEBA domain we often have the expertise of domain experts and analysts to provide feedback on the efficacy of an approach. There is a growing body of research in creating human-in-the-loop systems that directly utilize feedback to improve actively running models.

Defining Anomalous Regions

Depending on whether the problem is supervised, semi-supervised, or unsupervised a practitioner may use: classification, statistical, clustering, or information theoretic approaches to construct the normal region. No single approach has yet proven unambiguously to be better than all others for a single domain. More details on what works well per domain can be read in [Ahmed et al., 2015]. It is the data themselves which dictate what approaches can and cannot be used.

Many of our datasets rely heavily on large quantities of high-dimensional, heterogeneous data. Defining a normal region for these models is often left completely up to the practitioner. By dialing the various parameters that define the normal region, a practitioner may create an arbitrary number of anomalies and even modify the per-anomaly scores as they adjust system sensitivity. This results in systems which, by design, tend to over-alert. For example, for UEBA and SIEM systems this often means that the analyst is inundated with an endless torrent of anomalies from a model that never adapts to de-rank the ones they actively ignore.

Distributions Are Dynamic & Real Data Are Noisy

The security domain has highly dynamic data, even without the inclusion of active adversaries. This adds an additional challenge to performing anomaly detection in many UEBA technologies. With changes in the normal region come changes to the definition of anomalies for that dataset. If the normal region is highly dynamic, the variance in detected anomalies increases steadily. Some work (like [Pavlov & Pennock, 2003]) addresses this problem, but this is still an ongoing research challenge.

Many UEBA techniques utilize a graph or network to represent users, entities and their interactions. These too are often dynamic! These dynamic networks are of particular interest to modeling many UEBA tasks as they contain rich multi-user-entity relationships. A survey covering many of the best approaches can be found in [Ranshous et al., 2015].

A key approach is utilizing analyst expertise to continually refine the normal region (or ranking model) directly from analyst feedback. Using analyst feedback directly in a statistically sound way tends to produce models that actually detects things according to what is interesting, not just what is different. There are several successful active learning systems anomaly detection systems for security applications like [Beaugnon et al., 2018] and [Almgren & Jonsson, 2004]. Further development in active learning appears to be one of our best chances to conquer dynamic and noisy datasets.

Even the best data collection systems still induce noise. Unfortunately, noise often produces anomalies that are not suited to the analyst’s subjective interest function (e.g., a system hang caused unusual process behavior, but is otherwise uninteresting). Although they have fallen out of favor by researchers, noise removal [Teng et al., 1990] and noise accommodation [Rousseeuw & Hubert, 2011] are both approaches used to lessen noise in a dataset before analysis is performed. If the noise itself is similar to the anomalies, removing it often worsens the detectors performance.

Anomaly Detection System Transfers Are Hard

Lastly, UEBA research tends not to be concentrated in dense academic venues (unlike machine learning or statistics). This makes sharing and disseminating research findings much more challenging. Because UEBA solutions do a wide variety of different things, lessons-learned from one system transfer poorly to others. In addition, anomaly detection approaches are extremely domain-specific as noted in [Chandola et al., 2009] and [Ahmed et al., 2015]. What is considered an anomaly in a fraud detection system may not be of particular interest in an intrusion detection system. These facts make the growth of UEBA as a research area somewhat gangly, segmented, and awkward.

In short, when considering an anomaly detection system for a UEBA solution, think about:

1) What drives the detection of these anomalies? Is it just uncommon behavior? Risky behavior? What makes an example interesting. If you can’t describe these then anomaly detection is likely to underperform in your solution. The data used as an input should be able to help explain what is strange or anomalous about the data.

2) Are your anomalies really point anomalies or is context important? Is a single action enough to say it was anomalous or do you need a collective anomaly (multiple actions)?

3) How quickly does your system or population change?

4) Can I directly involve the analyst with improving my product to perform active learning (human-in-the-loop anomaly detection)?

Acknowledgments: Nociconist, Azmi Dwipranata, and Baboon Designs for diagram components.