First Round of MITRE ATT&CK™ Evaluations Released

Frank Duff
Published in
7 min readNov 29, 2018


We have just published the first seven MITRE ATT&CK™ evaluations on our new website. We have created an open and transparent methodology for evaluating vendor capabilities based on real-world adversary behaviors found in ATT&CK. Our goals are to help vendors improve their capabilities, as well as arm defenders with a deep technical understanding of these capabilities. Our new website makes the methodology as transparent as possible for both vendors and users. We want everyone to understand what was evaluated, how it was evaluated, and what the results were. Explaining the evaluations with this detail provides critical context to the results and allows them to be useful.

Evaluating Capabilities with MITRE ATT&CK and Adversary Emulation

The MITRE ATT&CK knowledge base helps you understand how adversaries behave, what they’re trying to do, and how they’re trying to do it. We use ATT&CK to create our threat-based evaluation methodology to inform the public about how a vendor’s capabilities detect different threat behaviors. ATT&CK provides a common language for the evaluator, the vendors, and those interested in the results, whether they are responsible for implementing these solutions in their environment or making purchasing decisions.

While we aspire to test across the entirety of ATT&CK, the number of actions required to test all techniques and the many possible variations in technique implementation (i.e. procedures) makes testing across the entirety of ATT&CK impractical. Additionally, certain techniques are complex and are not implementable in a lab environment. We need a way to select a subset of techniques to define test criteria, so our evaluations focus on techniques used by a known threat group, which we refer to as adversary emulation.

Adversary emulation takes techniques publicly attributed to a single adversary group and then chains them together into a logical series of actions that are inspired by the group’s past behavior. To generate our emulation plans, we use public threat intel reporting, map techniques in the reporting to ATT&CK, chain together the techniques, and then determine a way to replicate the behaviors.

Round 1 (APT3/GOTHIC PANDA Evaluation) Techniques in Scope

We chose to emulate the threat group commonly known as APT3/GOTHIC PANDA for our initial evaluation because there is enough substantial reporting of their post-exploit behavior to create a suitable emulation plan for evaluations. Their publicly known post-exploit behavior relies on harvesting credentials, issuing on-keyboard commands (versus Windows API calls), and using programs already trusted by the operating system (living off the land). Similarly, they are not known to do elaborate scripting techniques, leverage exploits after initial access, or use anti-EDR capabilities, such as rootkits or bootkits.

Collaboratively Working with Vendors

We work collaboratively with vendors to articulate how their capabilities can detect adversary behavior using the common language of ATT&CK. These evaluations are not a competitive analysis, so you will not find scores, rankings, or ratings. Instead, we show how each vendor approaches threat detection in their own way. MITRE’s impartiality and unique vantage point as a non-profit enables us to approach vendors as partners. We help them focus on detecting adversary behaviors, as well as communicate their capabilities to their customers.

We approach the evaluations with a collaborative, “purple-teaming” mindset, and we think this allows us to better articulate what a vendor’s capability can do than if we left them out of the process. During the evaluation, MITRE and the vendor are in open communication. We announce the techniques as they are executed, and the vendor can ask us details about how the procedures were implemented. The vendor then shows us their detections and describes their process so that we can verify the detection. Since our goal is to capture different detection methods, we may even suggest to the vendor how their capability might have detected the behavior. The open communication during the evaluation allows the vendor to better understand their capabilities and limitations, and motivates future improvement.

Releasing a Transparent and Open Methodology

Impartiality and transparency are essential components of MITRE’s mission. In this spirit, we’ve made our methodology available to everyone. This provides critical context to the detections we document, where specific implementation details and timing matter. We strive to make these evaluations measurable and repeatable, making them reliable and useful to assess how a capability has improved over time.

The testing methodology, which is centered around ATT&CK and adversary emulation, encourages vendors to create capabilities that more effectively address known threats. We hope our objective and open testing based on ATT&CK will assist vendors in advancing capabilities and help drive the entire Endpoint Detection and Response (EDR) market forward. We will continue to evolve the methodology and content to ensure a fair, transparent, and useful evaluation process.

Using the Results

Our ATT&CK Evaluations results are detailed and may be different from other evaluations you’ve seen, so you will likely use our results differently. Our evaluations look at each vendor’s capabilities within their own context, while doing so in a way that is consistent across vendors. Our evaluation results describe how product users could detect specific ATT&CK instantiations under perfect circumstances with knowledge of the adversary and without environmental noise. The results serve as an example of what a capability could do, although we realize the environment in which we test isn’t entirely realistic.

Direct comparison between vendor capabilities is complicated, and we encourage anyone using our results to consider other factors we didn’t evaluate. Our evaluations are narrowly focused on the technical ability to detect adversary behavior. There are other factors we are not accounting for in our evaluations that should be considered by decision makers as they decide which tool best fits their needs. You should consider factors such as cost of ownership, sophistication of your Security Operations Center, environmental noise, integration with other tools, user interface, security policies, and other factors. One product may not fit every need, and products can address different needs in different ways.

We realize many people will want to use our raw results to develop scores, rankings, or ratings. Should you decide to do this, we encourage you to consider each detection independently and rank it on how useful it would be to meet your unique requirements. Though we categorized detections, not all detections in the same category are created equal — some detections may be more useful to you than others. We’ve given you a head start by evaluating the technical capabilities, but this only provides a piece of the story, and we encourage you to consider these additional factors.

Our evaluations and methodology can assist organizations as they make critical decisions about which vendor capabilities best suit their needs. By looking at the detection abilities of these products and weighing their unique constraints, organizations may be able to “down-select” products that appear to best meet their requirements. We encourage using our methodology to test capabilities in your environment. This allows false positives, environmental noise, user interface, and operational impact to be considered in a way that is tailored to your organization.

Evolving Vendor Capabilities

ATT&CK Evaluations are a “point in time” representation of vendor detection capability. We note what we can detect during the evaluation, but this does not mean that the vendor does not have additional capabilities that could assist in the detection of the tested behaviors. Vendors will continue to improve their tools and their coverage. This is expected and is a desired outcome of the evaluations. Our evaluations are a starting point for describing vendor ATT&CK detection capabilities.

ATT&CK Evaluations Detection Types

Our results are also limited to how we define detections. We use detection categories to broadly talk in a common way across vendors. That said, the way these detection categories are manifested in vendor capabilities are not uniform. Some vendors offer ATT&CK-specific rules, some seek to provide base telemetry and enrich the behavior, while some seek to provide the data and rely on the analyst to make further sense. To this extent, not all detections within a category are created equal, and one category is not inherently better than the others. Organizations should weigh what is important for their use case and how vendors represent the data to identify which solutions might fit their needs.

What’s Next for ATT&CK Evaluations?

We are extremely excited to release our initial ATT&CK Evaluations. The initial evaluations included seven vendors, representing those that signed up before the June 30, 2018 cohort deadline. We welcome feedback on how you are using the results and methodology, as well as your suggestions for improvements. With your feedback we will continue to evolve the content and evaluations to better suit the public’s needs. As with ATT&CK, ATT&CK Evaluations is built for and enhanced by the community. We welcome threat intelligence that can inform our future emulation plans and potential emulation targets.

In the coming weeks, we will begin publishing our “rolling admission” evaluations. Rolling admissions are designed to allow vendors to participate in our evaluations even though they did not participate in our initial cohort of the seven vendor evaluations we released today. The methodology for the rolling admissions is the same as this initial release, and the results will be posted as they are completed. We hope vendors will continue to participate in our evaluation process so that the community can make more informed decisions on what solutions to buy and how to use them in their environment. We will also announce our plans for 2019, including the selected adversary for Round 2 emulation.

We hope you find the website useful and that it enables you to make more powerful decisions on purchasing, implementing, or creating these essential capabilities. If you have any comments, feedback or requests, please reach out to us at We look forward to continuing to evolve to address the needs of the community.



Frank Duff

Frank Duff (@FrankDuff) is the Director of ATT&CK Evaluations for MITRE Engenuity, providing open and transparent evaluation methodologies and results.