Graph Strategy for Proactive Threat Response

Weidong Yang
Kineviz
4 min readNov 28, 2023

--

Patterns of events that signal a threat can often be found after the fact in an enterprise’s big data archive. The patterns are typically complex, involving many different entities and connections over time. But they are hard to detect because the events of interest are sparsely embedded in the data stream. In addition, they are highly dynamic because bad actors regularly change strategies to avoid detection.

With a traditional monitoring approach, a big data flow typically goes to a data lake with a BI dashboard implemented to raise alerts as needed.

This is effective for detecting known threats, such as activity of known bad actors, or spotting statistical outliers. However, incidents are seen as isolated events and not parts of a connected pattern, and this makes it hard to put events in the proper context or to visualize patterns over time. Worse yet, the response is essentially reactive: patterns that haven’t previously been seen to cause a problem are not flagged.

We need to be far more proactive at identifying probable threats, ideally in near real time. Expressing threat events as connected graph patterns seems to be a natural solution. Unfortunately, importing entire big data streams into a single graph database is prohibitive in terms of both cost and performance, especially since events relating to threats represent such a miniscule part of the entire data flow.

Crafting a graph-enabled big data solution

To deliver near-real time threat detection, we’ve built a tiered solution that focuses on the respective strengths of graph technology and tabular big data.

Instead of pulling an entire data stream into a graph database, big data is connected to a graph environment in which we define, store, and detect threat patterns.

To start with, we store pre-defined component patterns, or features, that may be associated with a threat. Examples are users with a shared IP address or shared location, accounts created at the same time, and so on. The feature graph tier continuously monitors the data stream, and extracts and stores component patterns of interest into the graph database. Pre-defined features drastically reduce cost and improve speed of threat detection. In terms of storage requirements, a thousand-fold reduction can be achieved without sacrificing effectiveness. This means, for example, that 10 terabytes of raw data requires only about 10 gigabytes of graph storage.

Then we combine features into more complex threat patterns. A feature by itself is not enough to raise an alert. But when features connect in specific ways, patterns emerge that do warrant attention. The threat patterns are also stored in the graph and, when detected, are configured to trigger appropriate alerts.

With the feature graph, pattern search is fast and nimble. We still need tables to store big data so that we can access events related to a feature that may signal a possible emerging threat. But now we’re keeping only the most important data in active memory. The raw data is only accessed for highly focused investigation. This is cheap and fast since only small amounts of relevant historical events are needed at any given time.

For visualization and exploratory analysis, we embed GraphXR, our browser-based graph visualization platform. This highly visual approach to connected data encourages rapid ad-hoc discovery of hidden patterns, connections, and anomalies. We can connect it to graph as well as SQL databases, and its code-free tools empower subject matter experts to investigate patterns of connection over time and geospatial location. A notebook extension enables fast creation of reproducible workflows for data access, analytics, and both static and animated display.

Let’s look at an example focused on the flow of transferred funds over time. Here we see a classic pattern in which a group of connected users are sending funds over many hops to a single ultimate destination.

We can lay out the display to show an added timeline dimension, which clarifies even further the nature of the possible threat.

Animating the flow of transactions within the network is a particularly effective way of showing which users are sources of funds, and which are acting as a bridge between the source and ultimate destination.

Conclusion

Graph feature detection, visualization, and exploratory analytics, when coupled with live access to the raw data in a data lake, spotlights specific component patterns hidden in torrents of big data, and enables exploration and extension of patterns in the context of past history and activity. This practical tiered strategy has proven capable of delivering proactive threat detection in ways that BI-big data solutions simply can’t match. It now becomes possible to create more nuanced dashboards that not only deliver statistics, but also connected patterns and the context in which they occur.

--

--

Weidong Yang
Kineviz
Editor for

Weidong is an entrepreneur, scientist, programer and artist. He founded Kineviz and Kinetech Arts.