ATT&CK-based Product Evaluations: Frequently Asked Questions

Frank Duff

Published in

MITRE ATT&CK®

6 min readNov 8, 2018

This post was originally published August 14, 2018 on mitre.org

Background

Since MITRE released ATT&CK in May 2015, the community has used it to enable better communication between red teamers, defenders and management. Defenders use ATT&CK for table top exercises, assessments, and hands-on evaluations. The security community uses it to perform testing that informs capabilities and gaps in networks and products alike. What makes ATT&CK so appealing for testing is that it is based on the known threat rather than just the hypothetical. Additionally, the matrix visualization provides an excellent scorecard to capture evaluation results.

ATT&CK is embraced by both the public and private sectors, because they see the value in ATT&CK as a way of stating what tools can do. These companies are asking vendors to map capabilities to ATT&CK, and similarly, vendors are using ATT&CK to map products to a common language and communicate their capabilities.

Vendors are using ATT&CK to articulate their capabilities, but there is no neutral authority to evaluate their claims. MITRE’s new ATT&CK-based product evaluations fill this void. Since announcing in March, we have been busy engaging with vendors and community members interested to hear more about our approach. Now that we have announced the first cohort and evaluations have started, we wanted to take the opportunity to answer some frequently asked questions to ensure everybody understands our transparent testing process.

Frequently Asked Questions

Who is participating?
The first cohort: Carbon Black, CrowdStrike, CounterTack, Endgame, Microsoft, RSA, SentinelOne
Rolling admissions: Cybereason, FireEye

Are vendors paying MITRE to perform evaluations?
Yes. Paid evaluations are new to MITRE. There has been significant demand for unbiased ATT&CK evaluations and we needed a formal process to open up evaluations to the security vendor market. Participating companies understand that all results will be publicly released, which is true to MITRE’s mission of providing objective insight.

What will vendors get out of this?
Vendors get a third-party evaluation of their ATT&CK detection capabilities. These evaluations are not ATT&CK certifications, nor are they a guarantee that you are protected against the adversary we are emulating (in this case APT3) because adversary behavior changes over time. The evaluations provide vendors with insight and confidence into how their capabilities map to ATT&CK techniques. Equally important, because we are publicly releasing the results, we enable their customers, and potential customers, to understand how to utilize their tools to detect ATT&CK-categorized behaviors.

How do these evaluations relate to ATT&CK?
ATT&CK-based evaluations are built on the publicly-available information captured by ATT&CK, but they are separate from the ongoing work to maintain the ATT&CK knowledge base. The team who maintains ATT&CK will continue to accept contributions from anyone in the community. The ATT&CK knowledgebase will remain free and open to everyone, and vendor participation in the evaluations has no influence on that process.

How is ATT&CK used for evaluations?
The evaluations use adversary emulation, which is a way of testing “in the style of” a specific adversary that allows us to select a relevant subset of ATT&CK techniques to test. To generate our emulation plans, we use public threat intel reporting, map it to ATT&CK, and then determine a way to replicate the behaviors. The first emulated adversary is APT3. We plan to offer new emulations approximately every six months that will complement previous evaluations. The next group we plan to emulate has yet to be announced.

How does the evaluation process work?
The ATT&CK-based evaluations are based on a four-phased approach:

Setup
The vendor will install their tool on MITRE’s cyber range. The tool will be deployed for detect/alert only — preventions, protections, and responses will not be used for this phase.
Evaluation
During a joint evaluation session, MITRE adversary emulators (“red team”) will execute an emulation in the style of APT3, technique-by-technique. The vendor being tested will provide the personnel who review tool output to detect each technique (“blue team”). MITRE will also provide personnel to oversee the evaluation and facilitate communication between red and blue, as well as capture results (“white team”). For purposes of the evaluation, the red team will be granted access to the host (mimicking as though they gained initial access into the environment).
Feedback
Vendors are provided an opportunity to offer feedback on the preliminary results, but the feedback does not obligate MITRE to make any modification to the results.
Release
MITRE will publicly release the evaluation methodology and results of the tool evaluations.

When will results be released?

Update: Results will be made available on or around November 14, 2018.
We hope to release the first cohort results in October, ahead of ATT&CKcon (October 23–24, 2018). Vendors who participate in the subsequent rolling admissions will have their results released as they complete. We started with an initial cohort to maximize fairness, giving a group of vendors an equal opportunity to have their results released at the same time.

How will you be scoring vendors?
We aren’t going to score, rank, or rate vendors. We are going to look at each vendor independently, evaluating their ability to detect ATT&CK techniques, and publishing our findings.

Will you be using the ATT&CK matrix stoplight chart?
The stoplight chart, which uses red, yellow, and green to indicate level of detection, has been used since ATT&CK’s creation, because it is a simple way to understand how ATT&CK is useful. While a stoplight chart may be useful to show coverage and gaps, we will not be using this visualization because it is not granular enough to convey our results. We will be testing techniques in a variety of ways (so-called procedures), and how a tool can detect each procedure may vary greatly. We are developing a new visualization to better convey the subtleties of detecting each individual technique.

How are you addressing false positives?
While we understand the importance of minimizing false positives, they are often tied to environment noise. Without a good source of emulated noise in our testing environment, we won’t address false positives directly, but rather address them indirectly in a couple of ways:

Vendors are required to define how they configured their capabilities. With that provided configuration and the evaluation’s results as a baseline, users can then customize detections to reduce false positives in their unique environment.
Because we are not rating or scoring detections, there is no benefit to the vendors to fire an alert on every detection. Detections include alerts, as well as the existence of telemetry data that provides context to the adversary action. Given that many techniques in ATT&CK are actions sysadmins or users would perform, it may not be desirable to many end users for every technique execution to result in an alert.

We focus purely on articulating how the tool performs detection, and we’ll leave it to each organization to determine how the tools operate in their specific environment.

Does this mean that all detections are created equal?
Well, sort of. We will not make any judgments about one detection being better than another. However, we will distinguish between different types of detection and articulate those differences to include:

The vendor captured some data that is accessible to an end user and would allow a user to identify the red team activity, but no alert was produced.
The vendor detected suspicious behavior but does not provide context or details on the detection as to why.
The vendor detected suspicious behavior and provides an ATT&CK “technique” level description of the activity.

We will focus on articulating behavioral detection, consistent with the spirit of ATT&CK’s focus on detecting adversary behavior rather than indicators of compromise that detect a specific tool or artifact. That said, if suspicious activity is identified by a hash, IP address, tool name, etc., we will capture that in our detection notes.

Didn’t some vendors already do this with MITRE?
MITRE has a long history with evaluating technology and using ATT&CK to guide evaluations is no different. Several vendors have previously been evaluated against a different APT3 emulation. This round of evaluations is substantially different to ensure the evaluation process is identical for every participant and beneficial to both vendor and final report consumers. We have refined the methodology, emulation tooling, and reporting, making this test quite different from those performed previously.

If you have additional questions, concerns, or feedback, please reach out to the team.

ATT&CK-based Product Evaluations: Frequently Asked Questions

Background

Frequently Asked Questions

Written by Frank Duff