ATT&CK Evaluations Carbanak and FIN7: How to Get Started with the Results and Navigate the New Content

Published in

MITRE-Engenuity

11 min readApr 20, 2021

The results and emulation plan for our 2020 ATT&CK® Evaluations for Enterprise are now available on the ATT&CK Evaluations website. This evaluation emulated behaviors inspired by Carbanak and FIN7, threat actors that are often conflated, but whose available threat intelligence indicates are distinct groups that each leverage the Carbanak malware. A total of 29 vendors participated in the ATT&CK Evaluations, up from 12 APT3 (2018 evaluation) participants, and 21 APT29 (2019 evaluation) participants. Additionally, 17 vendors participated in a new protections focused variant, that explores how solutions block adversary activity across the adversary’s post-compromise lifecycle. We are thrilled to have worked with all these vendors and are excited to share the output with the global community.

Both Carbanak and FIN7 have a well-documented history of widespread impact. Carbanak is cited with the theft of a cumulative $300M (though some estimates are much higher) from hundreds of banks across 30 countries. FIN7 is credited with the theft of more than 15 million customer credit card records from victims spanning over a hundred US companies (across 47 states) as well as international markets.

In this post, I discuss a number of changes and updates to our results. In addition to the aforementioned protections evaluation, we included Linux systems in our detection scenarios. The navigation of the results has changed to pull together all data for each vendor, rather than dividing by each round. We have also added a number of high-level summaries around the results, including a breach summary for each day of the Carbanak and FIN7 round. All these enhancements come with our traditional round updates to include Carbanak and FIN7 information in the technique comparison tool, participant comparison tool, and release of our plan and tools in the Center for Threat-Informed Defense Emulation Plan Library.

Welcoming in Protections

This is the first round we offered an optional protections-oriented extension to our historically detections-only focused evaluations. All detection evaluations required tools to have protections disabled or in alert only mode. We allowed protections to be enabled, yet scoped, for this new extension. No insights gained from the detections portion were used to influence changes to their protection setup.

When we designed this protection evaluation, which was our most requested extension to our methodology, we didn’t want to just be the next AV test. We wanted to stay in the spirit of assumed breach and defense in depth, exploring a range of ATT&CK behaviors and the protections that addressed them.

To achieve this style of test, where even if an adversary activity was blocked we still want to explore the next activity, we had to augment our detections methodology to run more atomically. Using the methodology used to execute the detections evaluation, we engineered 5 test cases for both Carbanak and FIN7 (10 total). Each test case starts in a benign system state representative of our hypothetical user or system administrator, and participants were not allowed to block the activities required to get the system in this state. For example, vendors had to allow common command shell and PowerShell usage, connections over SMB, and a number of other rules of engagement.

After the system is set to its benign state for the test case, we would begin executing our adversary emulation step-by-step to determine when and if that test would fail. We would then work with the vendor to identify the root cause of the failure (and if indeed it was explicitly blocked by their tool). We could then document the evidence and proceed to the next test case.

We also had to devise a new way of displaying the results that would clearly explain when and how each block occurred and the associated ATT&CK Technique(s). We decided to list all techniques associated with the test case, call out the techniques that were blocked (purple text and a callout). We mark the steps the red team was still able to complete before the block (No block, black text), as well as mark all techniques that could not be tested after the block as not applicable (N/A, grey text). It is possible that other blocks for those grey N/A steps could exist for that vendor, but we were unable to test them.

The Inclusion (or Exclusion) of Linux

This is the first round we have included non-Windows systems in our evaluations. The addition of Linux this round was meant to be a small sub-set of intel supported activity, that would begin to highlight vendor capabilities in this space. While the detection evaluations were all executed in a similar fashion, the steps that focused on Linux detective capabilities could be opted out of for vendors. For vendors who opt out, you will see their total sub-step count drops from 174 to 165 (i.e. 5.A.7–5.A.11 and 5.B.1–5.B7 will be listed as N/A).

While some of these steps do have a network component that would provide some visibility into the tested behavior, we took the approach of setting a clear line around the Linux portion, and for this reason N/A should not read as a none without further exploration. Rather N/A means it was out of scope for whatever reason the vendor had to not participate in that portion of the evaluation.

Changing Focus from Rounds to Participants

In this Carbanak and FIN7 release, we have updated our site to allow for end users to explore solutions over time rather than round-by-round. You can still look at a specific round, but we now have a new vendor summary screen that offers a summary of performance across all rounds, allows you to explore ATT&CK Evaluations results in a Finder-style visualization, and then shows a variety of graphs to give an idea of over-arching detection distribution. You can then dive into any of the results to see the same level of detail we have always provided.

We have also consolidated all vendor information into this top-level view as we recognize many users come to the site to understand the full potential capabilities of a single solution and explore data across rounds. This new view will allow users to access this information more easily. If the latest (or any) round information is of primary interest, you can filter by that information and still dive into the results. If you want to explore changes and the evolution of product performance overtime, or how details about specific adversary behaviors, you can see that too.

Introducing Round and Breach Summaries

We have heard you. The results in the raw form are dense and getting started can be very difficult. New with this release is the round summary table. It is used to explore a variety of metrics on the underlying data. No single metric should be viewed in isolation or as a static score, and these metrics do not imply that higher or lower values are “better”. In fact, this is our first attempt so we welcome feedback, and we fully expect to evolve what data we capture and expose, but our goal is to give readers some quick insights before they dive deeper into the data.

In this first attempt, we use a few new terms, which I will explain here. We looked far and wide at what the community calls different detection concepts, and while these might be new for some, or more familiar to others, we feel it necessary to define these terms in our context.

· Detection — any information, raw or processed, that can be used to identify adversary behavior.

· Detection Count — this is the sum of all raw (telemetry) and processed (analytics) that met our detection criteria. A sub-step can have more than 1 detection.

· Telemetry — any raw or minimally processed detection (e.g., process start, file create)

· Telemetry Coverage — the number of sub-steps where telemetry was available

· Analytic — any processed detection, such as a rule or logic applied to telemetry (e.g., ATT&CK technique mappings or alert descriptions)

· Analytic Coverage — the number of sub-steps where 1 or more analytics were available

· Visibility — the number of sub-steps where an analytic or telemetry was available

Again, these are not the only metrics you should look at to determine how a solution performed, or what solution is right for you, but these metrics we hope give a better high-level understanding that make the ATT&CK Evaluation results more approachable.

Another major change for this round is the inclusion of breach summaries for each day of the evaluation. We understand that how a detection is presented to a user (context) can be just as important as the content. In previous rounds, we captured some of these nuances, such as how a chain of suspicious events are connected/correlated as well as how higher priority alerts/indicators are tagged, via detection modifiers. For this round, instead of detection modifiers, we have included a short description of the vendors Alerting and Correlation Strategies as well as a gallery of images that highlight how the tool summarizes the full breadth of events for that day, at the beginning of their results.

Visualizing the Results

For those of you that like visuals, in the APT29 release we began including graphs to attempt to show how detections were distributed over the steps of the emulation. We made some refinements to the graphs, which hopefully improve their usability. All graphs now split between the different scenarios to better show that each scenario is independent of the other.

The Distribution by Step graph allows you to quickly assess where the majority of detections occurred. Did detections group around a few key stages like initial breach or expanding access? It should be noted that the number of sub-steps per step is not constant, so detection count alone cannot answer this definitively. The Distribution by Sub-step graph looks at each step. If a vendor had a large number of credential dumping alerts, skewing their total detection count, this is more noticeable in this view. The first sub-step for every step is labeled, so that you can more easily identify which steps had a large or small number of sub-steps to provide greater context to the Distribution by Step graph.

Distribution of Total Detections by Step and Sub-step

A new view on our data is our Detection Type Frequency by Sub-step graph. This graph is used to more easily visualize which steps had which detection types. For instance, in the above referenced telemetry coverage, you can see what steps had coverage or not. Similarly goes with any of the other detection types. A unique aspect of this graph from the others is that it does not address how many detections of that type exist at that sub-step, unless you roll over the graph. To that end, this graph is used to explore coverage rather than counts.

Frequency Diagram of Detection Types by Sub-step

But where are the protections high level summaries? We have focused on the detection data at this point and built on the three rounds of lessons learned and feedback. Now protection results are published, we can begin our retrospective and listening to the community to figure out which aspects of the data they find most valuable. As I said at the top, this is just the start or our analysis, and we welcome your input.

Out with the Matrix, in with the Explorer

We have always had a hard time integrating ATT&CK Evaluation results into the Matrix view of ATT&CK. ATT&CK’s sub-techniques provided another opportunity for depth but proved to be a tipping point for our user experience. A clean, intuitive way of traversing tactics, techniques, sub-techniques, procedures and results in the Matrix form without oversimplifying our results proved infeasible. So, we went to the drawing board, and tried to create a way of exploring the ever-growing ATT&CK knowledge base and ATT&CK Evaluation results. The Matrix may make a return at some point, but for now, this interface will provide a more functional interface to round data.

To use this user interface, simply select around, and then select tactic, techniques, sub-techniques and observing results. In some cases, if a sub-technique does not exist, it will just skip that column. For any of those selectable elements, there is a link icon that will allow you explore all results for that tactic, technique, etc. When in those views you can also toggle between rounds to explore other data available for that vendor. I will also note, we have mapped (where possible) previous round data to their equivalent sub-technique, to make a more fluid experience.

New ATT&CK Evaluation Results Navigation

Like the other aspects of this site, we have some ideas on how to continue to enhance this new visualization. Please provide any feedback you have on what you like or don’t like about this new visual to help shape those ideas.

More to Come

We have much more to say on these evaluations and will be releasing additional content in the near future. In the meantime, I hope you enjoy the updates to the site we have made, and have success exploring the new Carbanak+FIN7 results. Remember in the next few days, to look at our analysis and all the other analysis done by others and consider what it means for you. There is no winner. There is no one way of looking at the data, or golden metric. There are 29 participants each with their own strengths and weaknesses, and story to tell.

If you are interested in participating in a future ATT&CK Evaluation, please reach out to evals@mitre-engenuity.org. The next round of ATT&CK Evaluations for Enterprise, featuring Wizard Spider and Sandworm, is now open for joining. The Call for Participation closes on May 28, 2021.