Suppose you’re hunting for threats on your network, and you find a suspicious process using wmic.exe. You dive into it a little bit more and find that the user account associated rarely uses this command and that, in this particular case, the command was used to execute a file remotely from another host.
If you’re using ATT&CK, you might immediately recognize this process as a potential instance of an adversary using Windows Management Instrumentation (WMI) for Execution. To confirm that this is indeed a malicious instance, you can leverage ATT&CK to guide your threat hunting, prioritizing your hunting based on techniques you think the adversary might be using in conjunction with WMI. But what techniques would an adversary use in conjunction with WMI?
This is a common request that we get when talking about ATT&CK — given evidence of one technique, what other technique(s) should I look for? Of course, the best answer to this question is it depends. Given evidence of one technique, the best technique to hunt for depends on your own environment, threat model, and risks. But as ATT&CK has been growing, we’ve decided to revisit the question and see if we could provide defenders with, at the very least, a place to start as they prioritize their hunting.
Using Semantics to Find Related Techniques
Generally, there are two ways to figure out which techniques are related to each other. The first is using semantic modeling. Under this paradigm, we’ll explicitly model what an adversary needs to execute and what an attacker gets after executing a technique. In our example of WMI, we can say that the attacker must be using existing credentials to use WMI and likely performed one of the Credential Access techniques to obtain credentials prior to executing that WMI command. This makes a lot of sense, so after seeing WMI, it’s a much better use of our time to look for Credential Dumping as opposed to, say, DLL Side-Loading.
We’ve spent a fair amount of time modeling techniques as part of our work on our automated adversary emulation system, CALDERA. In addition to encoding how to execute techniques, we’ve also encoded the requirements and consequences for executing techniques into CALDERA. This allows CALDERA to chain techniques together to achieve goals during its operations. For more information, you can check out CALDERA’s documentation, our Black Hat Europe 2017 presentation, or some of the papers we’ve published on the topic (here, here, and here).
Unfortunately, semantic modeling is hard. Coming up with the right model and explicitly encoding the requirements and consequences for each technique is very time consuming. To make it even more difficult, many techniques are described at different abstract layers. For example, Credential Dumping can be executed in a variety of ways (via the Security Accounts Manager, from LSASS memory, Kerberoasting, etc.), whereas the Credential Access via the Bash History technique is very prescriptive in describing how the adversary behaves.
In CALDERA, we’ve bypassed this problem by explicitly encoding procedures as instances of techniques, encoding one specific way to execute that technique. This works for us on the offensive side of the picture — where we plan forward to achieve a goal and need only have at least one successfully working procedure — but having an incomplete model can leave significant gaps in our defensive capabilities as we try to backtrack through the steps the adversary may have taken.
Taking a Data-Driven Approach
Instead of using semantics, in this blog post we’ll leverage a data-driven approach. With this approach, we’ll put aside the underlying meaning of a technique and instead look at how often techniques are leveraged together — assuming our data is, in fact, representative of the underlying meaning.
In a perfect world, we’d have access to reports explicitly referencing how attackers use techniques in conjunction with each other. Here, an ideal report would say that every time adversary X used technique 1, they followed it with technique 2, then technique 3, and so on. Alternatively, we might have access to our own internal data with this kind of information, combining reported threat intelligence with our own purple team logs. However, very few reports are that explicit regarding the adversary’s lifecycle, and this kind of internal data can be limited and/or sensitive.
In lieu of perfect data, we can leverage the ATT&CK corpus of knowledge to obtain results. For example, consider APT29, who has been attributed as using Pass the Hash, Software Packing, Domain Fronting, and others. We might look at a list of techniques like this and say that two techniques are correlated if we’ve seen multiple groups using them together.
In theory, this approach makes a lot of sense, but in practice it doesn’t work well. First, linking groups to techniques isn’t always perfectly clear in ATT&CK, as there’s a bit of subjective interpretation when analyzing and creating threat reports. For example, we don’t have reports of APT29 using Data from Network Shared Drive, but they have used CosmicDuke, which is capable of executing that technique.
While we could extend our notion of correlation to say that if a group and its associated software use two techniques, then those techniques are associated, we’ll still miss out on a bigger problem: group and software attribution ranges over all adversary campaigns. And if an adversary used technique 1 in one intrusion and technique 2 in another, that’s not very good evidence that the two techniques are related. Moreover, if someone covers technique 2 with mitigations, there’s no indication that an adversary wouldn’t use technique 3, and that we just haven’t seen the relationship between techniques 1 and 3 publicly reported on. Threat reporting — which is where these relationships are built — tends to not capture this particularly well, as they often only report a snapshot in time on what the adversary might do, as opposed to what they will do in every situation.
Instead, we should start at the source: threat reports. Since many threat reports typically summarize individual campaigns, then it stands to reason that if two techniques are reported in the same report, then those techniques were likely used as part of the same campaign, and thus may be correlated. There are, of course, many caveats here — including some of the comments above about quality and structure — but also whether each report was about one particular intrusion (where it will be reasonably accurate) as opposed to multiple intrusions over time (where accuracy would decrease). Nonetheless, this way of capturing the data works much better as we can more clearly enumerate where our potential biases lie (i.e., in specific threat reports as opposed to threat reports in general).
This kind of analysis is easy to do in ATT&CK because of the extensive threat reporting our team has analyzed. Each time we find a threat report attributing a group or software as having executed a technique, we cite that report and create a link. This knowledge itself is codified in not only the ATT&CK wiki, but also the ATT&CK STIX bundle. We’ve explicitly encoded links from groups and software to techniques by using “relationship” objects. The example here links the APT34 threat group to the Remote Desktop Protocol technique, citing the actual report that this was mentioned in.
Before diving into our data and some analysis, we should note that many of the things reported in and extracted from threat reports can be subjective. While we’ve done our best to minimize this subjectivity, the gold standard for this type of analysis would be using real data from your own network. In lieu of that, however, we present our analysis as both a starting point and inspiration to help motivate future research, and to help others understand the rich relationships we can explore with ATT&CK.
Getting the Data
These relationship objects offer easy-to-parse structure, and we can write simple Python scripts to iterate through them. Doing this, we can link techniques to the reports in which they’re mentioned. Figure 2 provides an example where we use the ATT&CK Navigator to visualize all of the techniques reported here. Once we have that, we then look at each report, incrementing a counter for each technique to each other technique mentioned in the report. In the end, we’ll have a two-dimensional matrix where the rows and columns are techniques, and every entry i,j is the number of times we saw technique j referenced in a report that referenced technique i.
With 219 techniques in ATT&CK, it can be hard to dive into the relationships between all technique pairs, so instead we’ll focus on a small subset, visualized below in Figure 3.
Figure 3 shows a subset of the matrix, where each box represents the number of times we’ve seen two techniques together. There have been five reports that mentioned both WMI and Commonly Used Port; eight reports that mentioned WMI alongside Credential Dumping; zero reports that mentioned both WMI and Peripheral Device Discovery; etc. Looking at the bottom right hand corner, we see WMI has been mentioned alongside itself a total of 23 times — corresponding to the 23 times we’ve seen WMI reported on.
This display of numbers helps us see and compare how often techniques have been reported with each other — and might provide a good starting place for us to begin hunting for techniques, given our suspicions that another technique may have been used. For example, looking at WMI, we see that PowerShell has had 14 overlapping reports, as opposed to just one for Shortcut Modification. If I saw something indicative of WMI on a host, it’d probably be a better use of my time to look for PowerShell instead of Shortcut Modification.
What if we wanted to visualize all techniques in relation to WMI? As an array, that’d be way too large, but fortunately we can use the recently released ATT&CK Navigator.
In Figure 4, we have a nice, easy-to-read heatmap that shows other techniques as they relate to WMI. As before, we see PowerShell is graded highly, but we also see that Registry Run Keys/Start Folder, Obfuscated Files or Information, and Scheduled Task are popular, among others. What’s nice is that by using the Navigator, we can dynamically generate files for each technique. Another cool one is Credential Dumping, visualized in Figure 5:
Interestingly, Credential Dumping is much less concentrated; it has a good spread across many other techniques in commonality. Contrast that with Communication Through Removable Media, visualized in Figure 6:
Admittedly not reported on as often as Credential Dumping — only five times as opposed to 76 — Communication Through Removable Media has relatively concentrated overlap, and also includes other removable media techniques: Replication Through Removable Media, Peripheral Device Discovery, Data from Removeable Media, and Exfiltration Over Physical Medium.
A Second Round of Analysis
Looking again at Figure 3, consider the relationship between Commonly Used Port and PowerShell — six reports have referenced both techniques. Similarly, User Execution has five references that mention it alongside PowerShell. At first glance, these seem like similar frequencies, but in a larger context they’re different — Commonly Used Port has been referenced 36 times in total, as opposed to only 13 for User Execution. So, the “6” between Commonly Used Port and PowerShell might not be as significant as the “5” between User Execution and PowerShell.
With this in mind, there’s a better way to create our matrix: each i,j entry should instead be the percentage of time we’ve seen technique j mentioned in a report mentioning technique i. Put this way, PowerShell was seen in about 17% of the reports we saw Commonly Used Port cited in, whereas PowerShell was seen in about 38% of the reports in which we saw User Execution.
In Figure 7 above, we reshape Figure 3 to instead report on the percentage we’ve seen a technique within the reports of another technique. Now we can see the relative significance of each relationship within each row; each box shows the percentage that column has been seen in reports that mention the row. With this chart, we can better understand some of the comparisons and co-occurrences of techniques. Below in Figure 8 we do the same, but now for randomly selected techniques that each have at least 20 references.
But Wait, There’s More!
Actually, there’s a lot more! This blog post only scratches the surface of some of the complex data analysis you can do leveraging ATT&CK. Sure, we have percentages, but can we normalize the data for better results? Can we start clustering techniques based on who uses them and what techniques they’re paired with? What’s the probability I’ll see technique A given I’ve already seen techniques B and C? We’re going to take a break now, but we hope to dive into these topics in future posts.
You Can Help!
This kind of data analysis is fun, helpful, and a great place to start from, but what’s here is far from complete and some of the results should be taken with a grain of salt. Part of the challenge in deploying this type of analysis is that we often lack reliable data about how an adversary would act at any point in time. Not only is the data we have now a rough approximation of what’s known, it’s also only an approximation of what’s known publicly and what we’ve been able to map back to ATT&CK. Ultimately, to make this analysis more accurate — and more useful for defenders — we’ll need more data. For those of you reading this who want to help make it better, I’d encourage you to read Katie’s blog posts here and here, which provide great examples of how ATT&CK can be used to advance cyber threat intelligence and help increase the accuracy of the underlying data.