How to Leverage ATT&CKⓇ Evaluations Results + Announcing Our First Ever Community Advisory Board

Ashwin Radhakrishnan
MITRE-Engenuity
Published in
11 min readApr 14, 2022

Introduction

Now that the dust is settling from the results we recently published for the Wizard Spider and Sandworm Enterprise Evaluation round, we wanted to take the opportunity to provide more insight into how to analyze the data we’ve published. Though the emulation and results analysis processes are standardized to minimize variability between each evaluation, each participant can choose portions of the test based on their product’s capabilities. For this scenario, vendors had the option to include or exclude steps involving Linux hosts and had an optional final day of testing related to protections in addition to the base model of detections.

By definition, we are not a product category (EDR, XDR, etc.) evaluation offering. We are a capability (Detections, Protections, etc.) evaluation offering. We help answer the question: what are the end-users getting from these products? Efforts in the market to homogenize our results and plot them over graphs and other visuals are not surprising, but they can be ineffective at making meaningful and actionable conclusions from the raw data. The question still remains: as an end-user who is looking to make purchasing decisions, how do I wade through the interpretations from vendor participants to find the tool that is right for my organization? It is the purpose of this post to answer that question and offer some direct advice towards the end of this post from the team who conducted the Evaluations themselves.

Some Background

As a precursor to making use out of the rest of this blog post, I would suggest reading the following content first:

  1. Dissecting a Detection: An Analysis of ATT&CK Evaluations Data (Sources) Part 1 of 2
  2. Actionable Detections: An Analysis of ATT&CK Evaluations Data Part 2 of 2
  3. Making Sense of ATT&CK Evaluations Data: Who Really Won and How to Avoid Common Pitfalls

Following the framework presented by those pieces should immediately help answer the question posed above, but here we will explore and extend these ideas with more recent context.

ATT&CK Evaluations: Our Philosophy

For the community at large, here are a few guiding principles of our Evaluation:

  1. MITRE Engenuity will never rank vendors in the ATT&CK Evaluations or stack them up against each other because we believe that an in-depth analysis is far more effective than simply looking at “scores” or “rankings.” There is no such thing as a one-size fits all security posture, and that is even more true when constructing a threat-informed defense that is going to actually defend you from any of the adversaries we emulate.
  2. There are no “winners” in the ATT&CK Evaluations, and vendor participants have been given guidance to avoid making such claims to that effect. Our team has extended the courtesy of reviewing content before publishing. If you happen across an interpretation of the data that implies a “winner” or “ranking,” it has not been endorsed, approved, or confirmed by our team.
  3. We do not encourage “apples to oranges” comparisons of our Evaluations results that compare our Detection and Protection results. Though the emulation and results themselves are structured and non-variable, there are countless nuances that arise from both the opt-in nature of the Evaluation and differences in the various product suites. Aggregating them into a uniform analysis or ranking may be a flawed take, and this is especially true because the Protections Evaluation and participation in Linux were opt-in, and not required by MITRE Engenuity. These delineations are clearly observable in the data we published, and consumers of our data are encouraged to leverage our Vendor Comparison Tool to compare achievements between various vendors.

In the spirit of those guidelines, our team has reiterated the above to specific vendors, especially in cases of messaging that misrepresented the testing and data. Our guidance was for those vendors to remove content from websites, webinars, and other marketing materials. Our intent is to help the cyber community better understand the Evaluation and what the results imply. We want to apologize to the vendors who participated in this test and are dealing with a negative fallout from this marketing hype. We want to also say thank you to the 25+ vendors who followed our guidance and accurately represented their own results.

Some Pointers from the Team That Collected the Data

The reason I joined the Evaluations team at the beginning of this round was twofold. First, I am a MITRE ATT&CK® Framework fanatic. The only way to truly defend against adversaries and protect data and provide for privacy is to formulate a strategic security program that holistically covers all corners of your attack surface. In my opinion, the MITRE ATT&CK Framework is the best way to do that, especially because it exists prevalently throughout features/functionality in security products. Therefore, the Framework works well to translate the operational aspect of security operations to the business problems that provide the funding for those operations. Second, I wanted to be a part of the incredible team responsible for planning, executing, and improving ATT&CK Evaluations. The complexity of running consistent and objective evaluations across 30 vendors is astounding, and the team responsible for conducting the evaluations should be held in the highest regard. They should also be deferred to when looking at results. To use the oldest marketing cliché in the book, don’t take my word for it, here are what the experts say about leveraging the publicly available results.

Some of our external stakeholders — both on the end-user and vendor sides — believe that the ATT&CK Evaluations team should be selecting the “best product” from the evaluation. In no uncertain terms, that is an impossible endeavor. Any attempt to “pick a winner” takes away from the core intent of why we conduct these evaluations in the first place. As Jamie Williams, Principal Adversary Emulation Engineer, describes,

“If we could responsibly tell you what product(s) were best for you, we would. The answer to that question starts with understanding your needs, and Evals provides one valuable input towards finding that answer.”

The end goal of our evaluation is to be one (of hopefully many) data points to building a threat-informed defense that works for your organization specifically.

As an end-user, what do I need to add to my security program? The existential answer to this question starts with understanding that different organizations require different strategies to protect against adversaries. For instance, a smaller regional bank in the southwest region of the United States may face a completely different set of threats than a large tech company in the Bay Area, even if they are targeted by the same adversary. Even if they do face the same threats, their teams may be comprised differently, not to mention one is more likely than the other to have a build vs. buy strategy in their security program. All of these traits, and many more to even attempt to list out, are the reasons that it is important to dive into the data to understand how the results are relevant for your specific strategy. These factors are also true in the vendor participants’ team makeup. We do not restrict the headcount or expertise of the folks that the team members bring to the evaluation. Brent Harrell, Lead Offensive Security Engineer, explains,

“Organizations leveraging ATT&CK Evaluations data should understand that, in some cases, the high detection rates were achieved by a team of product experts and developers hunting for specific events across multiple days. It’s my opinion that the real value of the data lies in the types of data sources leveraged by the products rather than the raw detection rates in the test environment. Comparing how the various products do their job lets organizations find the one that best fills a gap in their defenses, be it an intuitive user interface for a small team to triage events logs or API hooking for deeper protection.”

So to explicitly answer the question above, you should determine what to add to your security program by diving deeper into the data to see what fits your specific needs, especially focusing on the data sources listed in the results.

As a former product manager of security products, I can attest to how a great user experience for any security solution leads to more efficient and effective outcomes. As a community, we have the benefit of some of the best UI/UX in the enterprise solution space. One of the most unique aspects of our ATT&CK E valuations is that we have a deep collection of all the screens captured during our evaluation. If you care about the adversaries we emulated, you should care equally about the way you and your team would experience defense against the adversary leveraging the vendor’s product suite. Jake Spinney, Senior Offensive Security Engineer, explains,

“MITRE Engenuity doesn’t assign or endorse detection or prevention scores for ATT&CK Evaluations because this data alone is insufficient for making a determination on which product is best for your team. In my opinion, the best data from Evals are the screenshots. These paint a much bigger picture for security teams on how these products look and function during an attack by an emulated, real-world threat: What does an event for this substep look like? Does it make sense? Does it provide enough or too much information to act on? Can your team effectively use this product to stop threats in your network? No percentage can answer those questions in meaningful ways to any individual organization. Coming from a consulting background prior to joining the Evals team, scoring these metrics would be like grading art at a museum.”

To extend the metaphor slightly, as curators at our museum, our goal is to showcase the art, not editorialize by giving our opinion on the screenshots we’ve captured.

To view both the data sources and screenshots for any specific vendor, follow these steps:

  1. Navigate to the Evaluation Overview Page.
  2. Navigate to a vendor participant’s Overview Page by clicking a logo. You will see a header that says [Vendor] Overview at the top right of the page. This page is a consolidated view where you can find results for all Evaluation Rounds for which the vendor has participated.
  3. Select the “Wizard Spider+ Sandworm (2022)” link. If you’d like to drill down into results from previous rounds, please click that round in this step.
  4. The page you are taken to will allow you to toggle between the results for different Evaluation Rounds and each Scenario within that round. If an option is grayed out, the vendor did not participate in that round or portion of the Evaluation.
  5. The following is just some of the content you should focus on:
    a) Overarching screenshots that pertain to the Scenario at large. These serve as a fantastic way to get a big picture view of how a tool looks and feels, as applicable to the specific Scenario.
    b) Deeper information that pertains to specific sub-steps within the Evaluation. To view this information:
    i. Scroll down to the table under the overarching screenshots.
    ii. Hover over the “Detection Type” or “Detection Note” areas for the sub-step (1.A.1, 1.A.2, etc) you’d like to drill down into.
    iii. When you click the drilldown, a modal will pop-up to the right of the table that will show the Criteria, Data Sources, and individual screenshots relevant to that sub-step.

We also view this as a fantastic opportunity to pull the community forward as a whole. Reductions of our results to mere percentages take away from our opportunity to learn from these Evaluations and understand behavioral activity of the adversaries who always seem to be a step ahead of us. To that end, we really hope that the community looks into our emulation plan, step by step, to better inform their defensive postures. Lex Crumpton, Lead Cybersecurity Engineer — SOC & Blue Team, explains,

“ATT&CK Evaluations is an excellent way to test a vendor’s defensive posture and our team’s understanding of adversarial activity. Just like in a real cyber-attack scenario, no one solution is perfect. We as a cybersecurity community learn as a collective on how to improve. Not only should you look at a holistic picture, but you should also pay attention to the details each step in this process provides. There is always something new to learn on how to better improve ourselves for the betterment of the community.”

In conclusion, it’s important to remember that our industry would disappear altogether if there was a “best product” out there. If there was a one-size fits all solution, breaches would disappear, and we’d all happily be out of a job. As far as security operations is concerned, we like to emphasize that we want better alerts, not necessarily more alerts. “Better” is subjective, which is why we believe that these evaluations are so valuable. Brendan Manning, Senior Cybersecurity Engineer, explains,

“We encourage you to dig deeper than the upfront numbers and also utilize resources such as data sources and screenshots to help determine what product(s) are best suited for your organization. All organizations have varying capabilities and needs. No product is perfect or ‘the best’ for every organization.”

Overall, we are in this fight together and it is our responsibility to go the extra mile to use every avenue we can to fight against the bad guys. We hope that the ATT&CK Evaluations data is one of those avenues in your path to build an effective defense. Again, we encourage everybody to use the Vendor Comparison Tool to do side-by-side comparison between vendors to use raw results for their own analysis.

What’s Next?

As I described in my last post, we have a number of exciting and new offerings being released in the coming quarters. You’ll see the Trials Evaluations: Deception results at the end of the month and an announcement about our Managed Services Evaluations vendor participants sooner than that. Both are net new offerings which we are excited to bring to the market, and we are eager to continue to innovate and provide value to the information security community.

We are also announcing our first ever Community Advisory Board. Our commitment to innovation and consistency requires deeper interaction in a more structured format with the community we are looking to benefit. Our goal is to learn how we can make the data we present more accessible to the community at large. Though there are guardrails we need to operate within to maintain objectivity, consistency, and substantive value, there are likely some improvements we can implement to give the community better access to our data. In true product management fashion, we’d like to gather those requirements from the folks who benefit from our offering: the infosec community at large. The only stipulations to joining this Advisory Board is that you have a role in information security and are not currently employed by one of our vendor participants. If you are interested in joining our Community Advisory Board, please fill out this form. Based on interest, we will be exploring different strategies to gather feedback and deliver solutions that fulfill the requirements we gather. Remember, we are ALL in this together and our team at MITRE Engenuity is committed to continue to deliver on cybersecurity solutions in the interest of public good.

© 2022 MITRE Engenuity LLC. Approved for public release. Document number AT0031

--

--