The MITRE ATT&CK Evaluation Needs To Evolve
MITRE Engenuity just published the results of their fourth Enterprise ATT&CK Evaluation. As I’ve done for the past four years, I’ve published source code for analyzing the results on my personal GitHub page (https://github.com/joshzelonis/WizardSpider-Sandworm) and am using this personal blog to discuss my thoughts on what buyers should be looking at based on this and previous years’ scoring.
MITRE’s annual evaluation has moved the industry forward.
To demonstrate how much the MITRE evaluation has impacted the industry, I’m going to start by discussing a metric frequently called “visibility” — the percentage of events for which the product was able to show telemetry had been collected for. Looking back on the report I published on the very first ATT&CK Evaluation, the highest visibility score for the vendors in the initial cohort was 77%. Over the next 6 months, a number of vendors (who had access to the tests from the initial cohort) were evaluated against the same series of tests. Notably, with this additional information F-Secure was able to generate the highest visibility score in the evaluation with 89%. To remove the advantage of having access to the test before being evaluated, MITRE has since limited participation each year to a single cohort, but I think this is an incredibly useful data point for market maturity four years ago, many companies knew what was being evaluated and no one could hit 90% visibility. This year, twelve of the 30 participating vendors scored over 90% visibility with a full nine of them scoring over 95%.
Visibility isn’t the metric enterprise buyers should be looking at.
Visibility provides a great opportunity to understand what a product is collecting, which then allows you to understand how effective it would be for tasks such as threat hunting and investigation. Unfortunately, it tells you nothing about the analytic capabilities of the product. For this, you need a second metric… analytics detections. MITRE has three detection categories which are generally useful for this: General, Tactic, and Technique detections. By calculating the ratio of events with one of these detection types divided by total steps, you can suddenly get insight into the event classification engines of each product. Interestingly, six vendors were able to properly classify all the data they were collecting (Bitdefender, Checkpoint, Malwarebytes, Microsoft, Palo Alto Networks, and SentinelOne).
It’s time to raise the bar again.
We’re four years in. The MITRE ATT&CK Framework has become the industry standard for describing cyberattacks; encompassing 188 Techniques and 379 Subtechniques. Why are we still talking about General and Tactics detections?
I’ve added one metric to my analysis this year and believe it is the most important metric for enterprise buyers to look at… technique detections. By looking at the ratio of correct technique detections to total steps, we are requiring a higher level of precision in terms of the analytic capabilities of the product. Similarly, the visibility metric was really important four years ago, but the products you’re considering in 2022 are probably collecting all the data you need. I would submit that technique detections are the only metric needed this year for evaluating the participants.
My hope for next year…
My MITRE wishlist is threefold:
- Please eliminate configuration changes from this evaluation. Enterprise buyers need to know what they are getting out of the box. If the experts these companies have on site can’t get the product configured correctly, what hope do their customers have?
- Add subtechnique detections and remove general and tactics detections. The market is evolving to meet the requirements of this evaluation, help push the market.
- Make a call. Every year it’s a vendor circus when these results are published. I seriously don’t think my washing machine has a spin cycle as good as many of these marketing teams. The industry needs MITRE to make a call, even if it’s just putting protection scores on one axis and detection scores on the other.