Getting Ready for the Carbanak and FIN7 Evaluations

Published in

MITRE-Engenuity

8 min readMay 20, 2020

On February 20, 2020 we announced the next round of ATT&CK Evaluations for Enterprise based on Carbanak and FIN7, which included a public call for intel. We received and processed the intel contributions, and updated our technique scope. On April 21, we released the emulation plan and results of 21 endpoint detection vendors who participated in our APT29 evaluation. We have leveraged the lessons learned to adjust the Carbanak+FIN7 detection categories. The changes to both the technique scope and detection categories are now posted on our site.

Updated Technique Scope

Before we dive into what’s changing with our result format, I’ll start with the update on our technique scope. We noted previously in the APT29 plan, that Kaspersky, Microsoft, and SentinelOne provided intel that allowed us to better understand not only what APT29 does in an intrusion, but also more detailed information on how they behave. With this enhanced insight, we created a more accurate methodology that enabled the evaluations to more fully grasp the true capabilities of the products tested. As we did for APT29, we included an open call for intel in our Carbanak and FIN7 announcement. We received similar feedback to inform the Carbanak and FIN7 emulation plan and adjusted the technique scope to better reflect our understanding of their tradecraft. Our vetting process begins with a comparison between the contributions and the intel within the public space. In the next step, we further enrich the intel with deductions from our team, as we bridge intel gaps. Once we are confident in outcomes from our vetting process and our understanding of them, we adjust our technique scope. The updated technique scope is now posted on the ATT&CK Evaluations website.

The adjustments addressed the overall evaluation scope. There were no changes to the subset of techniques that comprises the Linux Technique scope. Additionally, as you may have noted from the APT29 scope and results, there may still be deviations in the final implementation. The technique scope is meant to be a superset of what will be tested, and some might not be included in the final emulation plan.

Carbanak+FIN7 Scope (Truncated). Full version available here.

On a related note, the ATT&CK team released the sub-techniques beta on March 31st, and one of the most frequent questions we receive is how the announcement of sub-techniques impacts the scope of the evaluation. The short answer? It doesn’t. Given that sub-techniques are still in beta stage, we chose not to redefine the scope or create the expectation that the participants needed to rush into integrating sub-techniques. When discussing the techniques within the evaluation, or presenting, we will bring in sub-techniques as appropriate. For example, in our APT3 evaluations on Step 16.A we performed “brute force”. The sub-technique would actually have been “password spraying”. During the evaluation, a detection that accurately described the behavior as brute force or password spraying would receive a “technique” detection and show in any results view as such.

Revised Detection Categories

Detection Categories received an overhaul between the APT3 evaluations and the APT29 evaluations. This revamp was primarily based on the fact that we found it was hard to talk authoritatively about what happened in the background (the logic), but we could talk about what was seen by the analyst (the context). While we tried to capture both in the APT3 evaluations, we decided to focus on context in the APT29 evaluation categories by tying them to ATT&CK — Tactic, Technique, and pair with the less specific General category. We maintained the Telemetry category to reflect the existence of raw data that could be used to detect the activity. This worked well based on the feedback received from vendors. With significantly more clarity as to why a detection received one category over another, we are going to keep these categories for the upcoming Carbanak+FIN7 evaluation.

While the main categories are mostly staying the same, there are some changes to the detection categories to address confusion with the results. The first adjustment addresses the blurry line between None, None with Notes, Host Interrogation, and Telemetry. We have decided to simplify things. None will only be used when the vendor was not able to meet the required detection criteria — there will be no longer be notes or screenshots to confuse things. Host Interrogation, which used to be a modifier to None, will now be included as Telemetry, as long the data is available natively within the product’s user interface. This is to acknowledge that as capabilities have evolved, the line between Host Interrogation and Telemetry has blurred.

There is an additional new category related to None, and that is Not Applicable. This category specifically focuses on whether a capability existed to meet minimum detection criteria. The update is primarily due to the Linux and protections being included in this round of evaluations, while they might not apply to all vendors being evaluated. None indicates a miss. Not Applicable indicates that the vendor did not have capabilities to address the technique under test. Prior to the evaluation, vendors will indicate where agents are deployed to set their tailored scope for the evaluation.

The Alert modifier, which is applied to a General, Tactic, or Technique detection, was designed to articulate when a vendor added a visual cue to a detection to draw the analyst’s attention to malicious behavior. This would be in contrast to labeling data with ATT&CK-related content, but without any indication it was malicious to potentially drive down alert fatigue. In practice, we found whether something was an alert and enrichment only depended on which view was being shown by the analyst. The visual indicator component was also very difficult to define. Low severity still provides some indication, hence an alert, even when the activity could very well be benign.

The problem with the Correlation modifier was a little different. While connecting events to prior suspicious or malicious events is key to understanding the full breadth of adversary activity, during the evaluations this ended up being an exercise in futility as implemented, and a test to get the right screenshot to effectively show correlation. This problem is exasperated by multiple tool user interfaces, not all clearly showing correlation, and large and complex event trees, making tracking a detection back to a parent alert difficult.

In the cases of Alert and Correlation, relying on our understanding of the product, ensuring that the right screenshot was taken, or that the vendor was providing the right feedback to justify a change, is not what we want from our evaluations. What we want to do is talk about the uniqueness of vendor approaches to these important aspects to endpoint detection products. For example, with alerts, do they have static tiers of alerts or do they have a constantly changing severity that leverages the correlation concept? Do they enrich data without alerts, or is every detection treated as an alert? For correlation, does the tool require the analyst to stitch together activity based on a unique identifier, show alerts within an event tree, or do they roll up all activity under a single alert?

We reflected on what we were trying capture with these modifiers, and whether there was another way to represent them. Our new approach aims to explain the vendor’s alert and correlation strategy, rather than a technique-by-technique tracking of modifiers. How and what will be tracked for this new format is still being refined, but the broad use of these modifiers will not be implemented during the Carbanak and FIN7 round.

Other changes include the removal of the innovative modifier, which we decided not to use in the APT29 evaluations. We also removed the MSSP category, as previously noted in our Carbanak and FIN7 announcement. Finally, Configuration Change has been expanded to include a data source qualifier, which was previously conflated into the detection qualifier.

Added Protection Categories

We expect to learn a lot about ATT&CK protections during our Carbanak and FIN7 evaluations. To avoid the confusion caused by the drastic change to categories between the APT3 evaluations and the APT29 evaluations, we decided to start simple with protection categories. If needed, we can adjust to more effectively discuss additional protection qualities. At the onset, we’ll focus on whether vendors don’t have any capability (Not Applicable), have the capability and didn’t block or did not provide a reason why a block occurred (None), prevented the red team from executing (Blocked), or prevented the red team from executing, but required the user to take an action (Blocked, User Consent). We expect to ultimately provide additional details to improve the understanding of the results, but this will serve as a common starting point to build from.

Data Sources

Another addition we are excited about is the inclusion of data sources to our detections. As our Red Team Lead, Jamie Williams, points out in his latest blog, there can be a significant difference in what created a detection. Was a remote file copy detected because of the network event, the command-line issuing the copy command to the remote destination, or the file create at the destination? Was an alert generated from a process event, or did it fuse multiple data sources together to give additional context?

We hope that by expanding on what data sources are driving detections, we can shed light onto how these products detect, how the detections differ from one another, as well as enable users to better understand whether the results are applicable for their environment. For example, does a user know a data source will not work in an environment because of potential data volume? If so, they can filter that content out and compare more applicable results.

What’s next for the Carbanak and FIN7 ATT&CK Evaluations?

The Call for Participation remains open until May 29, 2020. The evaluations will start in August and run through mid-December. The results are expected in early 2021. As previously stated, ATT&CK Evaluations are now run out of MITRE Engenuity, MITRE’s new tech foundation for public good. Under this effort we recently announced ATT&CK Evaluations for Industrial Control Systems (ICS), our first major expansion into a new evaluation domain. For more information on Carbanak+FIN7 evaluations or ATT&CK Evaluations for ICS, please contact us.