Security through Data Fusion: Situation Assessment

Now that we’ve covered the more familiar parts of Low-level Data Fusion, I’m sure you’re primarily interested in how to proceed to the higher-level objects (Relations, Impacts and Responses). Let’s talk about the next step in the climb up this ladder, which is Situation Assessment.

“Situation Awareness”, a state of mind, is having a mental model of where you are and what’s around you, so that you can make some predictions about what could happen next (ultimately to drive your actions). Situation Assessment is the process of creating Situation Awareness, and since we’re focused here on building systems to drive the data fusion process, we need to think about how to automate as much of Situation Assessment as we can.

While in a way everything that we’ve done up to this point is part of situation assessment, in the terminology of the framework we’ll call anything that reveals the Relations between entities “Situation Assessment”.


  • Data → Relations: Low-level Situation Assessment
  • Features → Relations: Feature-based Situation Assessment
  • Entities → Relations: Situation Assessment

where for now we’ll focus on the last one.

The key here is in using all the information that we’ve gathered in our entity database, combined with more dynamic data sources, to understand how entities are interacting with each other.

In the walk from Data → Features → Entities so far, where we were building our entity database, we were focused on features that are relatively static, that change on the order from hours to days, to never. Entity relationships are ever-shifting, however, so now we’ll need all our most dynamic data sources: log events, network data, file events, even system calls.

We’ll need to make that same walk up the ladder Data → Features → Entities by carrying the extracted features along with the data, and linking them to entities in our database. We’re particularly interested in cases where entities interact. As an example, here’s a log message* where the features in a piece of data are identifiers for multiple entities:

  • Aug 3 09:04:59 ns1 sendmail[21396]: i73D4pE21396: ruleset=check_rcpt, arg1=<>, relay=[], reject=550 5.7.1 <>… Relaying denied. IP name lookup failed []

So we have an IP address (which we may be able to link to a host entity) attempting to send email to a userid (which we may be able to link to a user entity). In this case, the sending failed, which is one of the features that we should extract from the message.

Since we’re interested in multiple entities, we will often be mixing entity types: in the enterprise, that means primarily relations between hosts and users. For example:

  • A user logs into a host
  • A user escalates to another privilege level on a host
  • A user logs into a single sign on (SSO) authentication system to generate an authorization token
  • A client host connects to a service on a server host
  • A client host requests a resource on a server host in the name of a user, providing the user’s authorization token

Now, so far we’ve focused almost exclusively on our own entities, but of course no network is an island (even airgapped networks are porous because of the “sneakernet”**). So, we need to take into account the entities in the outside world. This doesn’t mean that we need to start tracking every host, but at some granularity we should start to understand the outside world.

We need to think clearly about what things are considered Entities. In the enterprise, we’ve been talking about hosts and users. In the outside world, while hosts and users also make sense, we probably can’t track those reliably. However, a network could be tracked as an Entity, including building reputation models based on ownership and history, especially past malicious activity.

A hacking group (from beneficial to malicious) can be an entity, and it can be useful to track them based on Tactics, Techniques and Procedures (TTPs) used, which is what Threat teams do (and we’ll come back to that later).

Is a file a useful Entity? While a file is a chunk of semantically meaningful data (in other words a Feature) which may also be executable, files are also chunks of data that get moved around the Internet. They represent resources that live on hosts, and users access them through hosts. There’s already a lot of effort spent on tracking files, relating them to each other (which ones have common pieces of code), relating them to threat groups, and so on. Tracking files has been proven useful in practice. So yes, I think it’s worthwhile to think of a file as an entity in this framework.

Moving from Relations to Impacts will require using our understanding of the entities in our environment to color the relations between entities based on entity state and history. More on that next time.

Top article

Next article: Contextual intrusion detection

(*) Adapted from

(**) In case you haven’t heard the expression before, the “sneakernet” refers to people (wearing sneakers) moving data around physically, like plugging in a USB key they found in the parking lot, or plugging in their malware-infected phone to charge it.