The Feature Graph for Proactive Threat Detection

Weidong Yang
Kineviz
4 min readDec 8, 2023

--

In a previous article, I introduced a tiered architecture for detecting and combining specific patterns in big data streams.

The feature graph tier is a central element of that design within the graph environment. Let’s look at it in more detail and see how it works.

What is a feature graph?

Simply put, the feature graph consists of a set of pre-defined component graph patterns. A feature implies connection. You are defining individual elements typical of the patterns you’re interested in. In the same way that a smiley face can be built from a set of simpler elements such as a dot, or sub segments of a circle, a complex threat pattern can be built from features.

We find that even complex threat patterns can be broken down into a manageable set of such features. Features typically fall into a couple of broad categories:

  • Features suggesting connections, such as shared IP, shared location, accounts created at the same time, accounts accessing the same resource or accessing the same channels.
  • Features suggesting anomalies. These include a variety of coordinated actions (such as flow of funds to a single destination from multiple sources), multiple failed attempts, or unnatural timing and trends.

Once you capture features they are saved in a graph database. The feature graph is organized according to a graph schema similar to the simplified one shown below, in which actor, event, resource, IP, and feature categories are connected through specified relationships.

The feature graph is designed to be extensible. You can start with building blocks of known threat patterns. Over time, you capture other interesting features, and this serves later to identify further complex and emerging threats.

From features to threat patterns

We employ the feature graph to detect features as they arrive in a data stream. Individually a feature is not enough to raise an alert. But when component features connect in specific ways, the resulting graph pattern can clearly highlight an emerging threat before it has a chance to succeed.

Here’s an example where we look at patterns of usage over time. We are interested in people sharing IP addresses because the accounts might be created by the same person (or group of people acting together). The graph represents users sharing IPs on a specific date. On day one we see many users sharing IP addresses. That in itself may not be a problem. A few days later we looked at the same group of people. Notice again there are many users coming from the same IP. So now we see a pattern implying that there may be collaboration behind the scenes.

To explore that possibility, we bring the days together using the IP as a connection. Now we can see that a large group of people are actually connected with each other.

In this connected group there’s one user who has been flagged as a known bad actor. Previously we had no idea about any of the other connected users over time, but now we can immediately use label propagation to flag all these users over multiple hops and add them to our watch list.

This is a fairly straightforward example of finding connections among users sharing the same IP over time. Other resources can be shared as well. For example, creating accounts at the same time can be a signal to establish a connection. Or, accessing the same resource at the same time can suggest collusion among bad actors.

For proactive investigation, threat patterns that are detected in the data stream can be rapidly explored and augmented as needed from the raw data. Once you’re able to visualize the connections as a graph, you can start to see hidden patterns that are very difficult to detect just by looking at the original data. This makes it possible to identify new features of interest and to visualize new patterns of attack that are being set up.

Conclusion

Features are used to build patterns that signal a known threat, also stored as graph data. Pre-calculated features…

  • Let you immediately visualize simple patterns in big data streams,
  • Provide a way to build more complex threat patterns quickly
  • Drastically reduce cost and improve speed for threat detection.

This architecture lets us efficiently detect threat patterns as they emerge, and explore the graph for new variations and strategies as they occur.

--

--

Weidong Yang
Kineviz

Weidong is an entrepreneur, scientist, programer and artist. He founded Kineviz and Kinetech Arts.