Using graph models to make the case for greater cyber security incident collaboration

Nicolas Kseib
TruSTAR Blog
Published in
4 min readMar 14, 2016

When the renowned mathematician Euler first used graph theory in 1736 to solve the problem of the seven bridges of Königsberg he was trying to solve a pretty straightforward problem. Little did he know that he would be responsible for unleashing applied graph theory to address some of the most wicked problems in the universe, from physics to chemistry, and yes, social networks.

At TruSTAR we are constantly trying to make incident exchange and collaboration easier for security operators and analysts. To that effect the TruSTAR data science team is developing a semantically unified data model capable of supporting incident responders and threat analysts in their daily operations. There are a number of efforts underway to define what a semantically linked data model for cyber incidents and threat data looks like; standards like STIX and CyBox are currently leading the way. We are leveraging these industry driven efforts, but our motivation is to define a data model that allows us to easily enrich cyber incidents by extending and expanding connections between the data entities.

To demonstrate the advantages of graph models in the context of incident collaboration we conducted an experiment that emulated the recent hospital ransomware attack. In our experiment, we collected incident and threat data from a number of open source feeds such as malware domain list (we will not point to the that website as it contains malicious URLs) and MalwareMustDie (article 1 and article 2).

Figure 1: Simple graph data model.

In a cyber incident sharing context, figure 1 shows how the data can be represented in a graph model. A record node represents information collected from a number of different sources, including user-reported incidents, and paid/open source threat data feeds. These nodes can have many properties as listed in figure 3: the time the record was created, the sector, etc. Another type of node is indicator — and they are related to records using the relationship contains. The properties of the indicator nodes are: Type (IP, URL, SHA1, SHA256), and its value. The contains relationship has only time as a property. Furthermore, when two record nodes contain the same indicators they implicitly correlated to each others.

How is that model useful in practice?

Enrichment!!

  1. In our scenario an analyst from Hospital A has uploaded an incident report titled “Ransomware” shown in figure 2. The shared report “Ransomware” contains two indicators of threat activity, an IP and a URL. He is now trying to find all other data nodes that have indicators in common with the report he submitted.
  2. After the report is submitted the graph is updated. The graph in figure 2 shows that there is another report titled “ELF Linux/Torte infection” that shares a common URL indicator “clodo.ru” with the submitted report. Additionally there is an open source document titled “MMD-0050–2016”, as well as two external intelligence feeds that have indicators in common with the submitted report.
Figure 2: Graph data model for “Ransomware” correlations.

Even more Enrichment!!

  1. Our analyst wants to find out if any of the other reports contain indicators that can enrich his current understanding of the situation, and add more context. He double clicks on the incident report “ELF Linux/Torte infection.” His original incident that he uploaded was about a Ransomware attack. The correlated report was previously shared by cloud provider X that indicates its servers are being used to host ransomware programs.
  2. This document is also correlated to another report titled “Warning of Mayhem shellshock attack” shared in advance by cloud provider Y. The report also mentions the same hacker group hosting these ransomwares. Now the indicator relating the reports shared by these two cloud providers is also a URL related to a cloud hosting company.
  3. By looking at the report entitled “Warning of Mayhem shellshock attack” we can see that it contains many IPs that were not contained in the previous reports thus enriching the incident.
  4. The analysts from the hospital and the two cloud providers can now connect and take further mitigative action w.r.t. the additional IPs and domain names. Also, the hospital by collaborating can learn more about the attackers’ capabilities in exploiting the vulnerabilities in their systems.
Figure 3: Final obtained graph.

This was a simple exercise to demonstrate the value of extending, expanding and enriching connections between data entities. A graph model makes this demonstration more intuitive as it helps in understanding how data shared by different players can be connected: hence it is able to return immediate value to the collaborator. The prior analysis can be applied to much more complex scenarios, all with the objective of understanding potential attack vectors and taking mitigative action against them. Watch for this space as we start releasing more techniques to enhance and enrich the data model and the associated analytics it can enable.

Disclaimer: No security operators were harmed in the making of this article. All the data shared were obtained from Open Source feeds.

--

--