Security through Data Fusion: Threat-Driven Detection

There’s no one right way to develop and enhance your detection & response activities — the best approach for you is probably to start with your strengths. If your organization has a well-developed IR process, and has done some thinking about the threats that you face (due to mission, data held, business vertical, profile, or other factors), then you should start with threat and let that drive your detection portfolio.

Within the Security Through Data Fusion framework, that means exploring how “Reactions” (incident response and the arms race with attackers) feed back into Data, Features, and so on. This approach can be particularly valuable when you’re heavily resource-constrained, starting with a disciplined prioritization of threats (intruder activity) and hazards (accidental security lapses) you face (much more on those in a future article).

Based on your understanding of threat:

  • What actions would these actors be most likely to take at each stage of the kill chain? What is their objective (typically some data of value)? Frameworks like MITRE’s ATT&CK (particularly the Attack Navigator) can be useful for mapping a known threat actor to a set of typical actions. While attackers certainly can and do alter their actions, they have considerable investments in the tooling they already have, so they do (and sometimes must) repeat themselves.
  • Is there a natural choke point where a small number of controls or detection capabilities would greatly mitigate the risk?
  • If there’s no natural choke point, then start at the objective Entity (e.g. the critical data store) and build robust detection from there to a practical number of steps away from the objective (traversing backwards up the attack graph).

Key to answering the above questions would be to map out your architecture, and your (more abstract but more useful) control structures (more on that when I write about Threat/Hazard Analysis).

Here are some specific things to consider:

  • Reactions → Impacts: Adaptive Detection

The attacker’s Tactics, Techniques and Procedures are the real security events that you’re trying to detect. Traditionally, this would mean doing signature-based approaches to short-circuit from Data or Features →Impacts, but that approach (as we’ve discussed) means maintaining a set of many signatures, managing false positives, etc. That may still be useful (and perhaps best outsourced to a vendor, but with the ability to add local signatures, disable noisy built-in ones, and feed in local context e.g. O/S per host).

In a blended approach, we should also look for opportunities to find malicious activity based on anomalies associated with the entities of interest. Understanding threat will help you decide what your critical data stores are. Focus on those first, asking yourselves questions like: if an attacker dumped this database, would we detect it (and if not, how can we detect it)?

From there, how would an attacker gain access to the database — through the front door (by exploiting the web application that it feeds) or the back (by compromising an employee and using their access as an administrator or developer)?

What about access from the side, by gaining direct access to the database through a misconfiguration or an unexpected path? Be thorough — since you’re starting at the objective, hopefully the avenues are approach are limited. Set up automated validation of security controls already in place.

Based on your (limited, objective-focused) attack graph, what entities would you need to monitor? How would you distinguish between an ordinary anomaly (an employee accessing random documents when trying to learn a new subject) from a malicious anomaly (a compromised employee searching for credentials or administration documentation).

  • Reactions → Relations: Threat-based Situation Assessment

A relation is encapsulated in one or more subject → action → object interactions between entities. Based on the above considerations, what relations would be considered dangerous, vs. merely anomalous?

One approach is to consider the typical “contact surface”, i.e. which subjects typically act on which objects (e.g. what users access what documents)? By developing similarity metrics for these contact surfaces we can surface anomalies. Recommender models can be useful for this (your team members didn’t like (access) this document, so your accessing it gives it a low recommendation (high anomaly score)).

This would need to be enhanced with context about the criticality of data in the object in the context of the prioritized threats, either because it’s a large data store with known contents, or because we’ve applied some process for tagging smaller, more dynamic data (e.g. NLP tagging across a document store, with particular tags being identified as critical).

  • Reactions → Entities: Threat-based Entity Modeling

If we know what the important subject and object types are for our prioritized threats, then we need to be able to model their static or slowly-changing characteristics (e.g. for a user the start of employment, team membership, access group memberships) and their more dynamic behavior (e.g. documents or data stores typically accessed, applications typically used).

What factors would we need to know to be able to notice changes, and to provide enough context to evaluate the security impact?

  • Reactions → Features: Threat-based Feature Extraction

What features will enable you to model the critical entities you need to monitor? Purely volumetric measures are typically more useful if they have some security relevance already (e.g. number of login failures).

Some types of alert data could be better considered as features for a model (e.g. a user attempted to access a document that they did not have permissions for) because they are only security-relevant when part of a broader pattern of unusual activity.

  • Reactions →Data: Threat-based Collection

All the above considerations determine what data types we need to collect. More directly, though, threats can strongly suggest what data we need for visibility.

For example, if memory-resident malware is a significant risk, then we’ll need to be able to either collect memory images or be able to search memory for these artifacts.

If ransomware is a high risk, then (in addition to an organized program of backups with restore testing) we’ll need information about (and preferably controls on) what binaries execute on endpoints.

These data types in turn can feed into the full pipeline of modeling entities and identifying security-relevant behavior.

Threat-driven approaches rely on a prioritized set of threat scenarios. The next article in this series will examine in some detail how to develop threat scenarios in a way that can drive your controls and detection.

Top article