MITRE Engenuity ATT&CK® Evaluations: Managed Services — OilRig (2022) and the Top 10 Ways to Interpret the Results

Published in

MITRE-Engenuity

12 min readNov 9, 2022

We are proud to announce the completion of our inaugural ATT&CK® Evaluations: Managed Services round. There is a lot to unpack in this Evaluation, and this content should provide some much-needed context for you to consume the results effectively. With as much gusto as I can convey through a blog post, I urge every individual that seeks to apply the information gained from this round’s results to take the time to read this blog before trying to tackle the output. The results are far more effective with context, and we aim to provide as much as possible in this post.

The Big Reveal: OilRig

Unlike previous ATT&CK® Evaluations: Enterprise rounds, in the Managed Services Evaluation we employed a “black box” (formerly referred to as “closed book”) methodology and did not announce the adversary to the public or Evaluation participants at the outset. We will cover more about the general process in later sections, but for now, we’d like to announce that this Evaluation emulated behaviors inspired by OilRig, a threat actor with operations that align with strategic objectives of the Iranian government.

OilRig has a well-documented history of widespread impact, with campaigns appearing around the globe that are directed against financial, government, energy, chemical, telecommunications, and other sectors. The group continues to evolve its tactics, and leverages a combination of proprietary malware, customized versions of publicly available tools, and off-the-shelf, multi-purpose software. We selected OilRig based on their defense evasion and persistence techniques, their complexity, and their relevancy across industry verticals.

This round focused on OilRig’s use of custom web shells and defense evasion techniques. OilRig has demonstrated their sophistication in campaigns through organized resource development, unique data exfiltration methods, and use of customized toolsets to persistently access servers. While OilRig may leverage more common techniques compared to other threat actors, the group’s distinctive characteristics are rooted in their diverse arsenal of backdoors.

Our Emulation Plan combined relevant portions of a variety of OilRig campaigns. In order to support a new methodology for evaluating service providers, we sought to pace the Emulation Plan over the course of a five-day work week. Additionally, for the first time in an ATT&CK Evaluations round, we simulated benign user noise within the target environment, which was particularly relevant due to the black box nature of this Evaluation.

At a high level, the OilRig Emulation Plan commenced with a pre-positioned email containing a link to download a macro-enabled document armed with our initial implant. From there, the plan used well-known techniques to pivot from machine to machine, discovering new hosts and dumping credentials along the way. Through this activity, our Red Team eventually reached the SQL server to exfiltrate database information.

In this general process, the Emulation Plan reflected techniques and sub-techniques that are highly relevant in the real world. For instance, spearphishing was used for initial access due to its prevalence in known attacks. On the other hand, web shells, lateral movement via RDP (Remote Desktop Protocol) tunnels and PsExec, pass-the-hash, credential dumping, and exfiltration over email are all still relevant and used by adversaries today. Our technique scope was executed over multiple machines, which provided a particularly relevant Evaluation for enterprise environments that are targets for similar behavior by threat actors operating in the wild.

Components of this Evaluation Release

Many of our community members are made aware of these results through the collateral (marketing, direct sales, etc.) prepared by participants, but the results that they reference are merely the tip of the iceberg. Below is a list of the release’s individual components:

Updates to the Overview Page — a summary of the entire Evaluation. This 10,000-foot view provides a great way to understand what was accomplished in this joint research project.
- ATT&CK Description — added context about OilRig.
- Emulation Notes — high-level notes about the Emulation Plan.
- Operational Flow — a diagram to describe the general operational flow of the Emulation Plan with some associated qualitative content to describe the scenario.
- Technique Scope — a MITRE ATT&CK Navigator representation of the specific techniques used in this emulation.
- Environment — the Microsoft Azure environment leveraged in this Evaluation with some qualitative text to describe the infrastructure.
- Reporting Status — qualitative descriptions of the Reporting Statuses awarded for each step/sub-step within the Emulation Plan.
- Additional Resources — ways to understand more about this Evaluation.
Results — each participants’ results from ATT&CK® Evaluations: Managed Services — OilRig.
- Archive — on each participant’s results page, you can find an archived file of all the content that was sent to MITRE Engenuity in this Evaluation.
- Demo Participant — for this inaugural round, we have also included a demo participant to help walk through the experience for the end user. Stay tuned, we will be publishing a walkthrough of the demo participant’s results in our next blog post.
Technical Collateral on GitHub — technical collateral that can be leveraged to understand, recreate, and defend against the technical details of this Evaluation.
- Emulation Plan — the steps and sub-steps used in this Evaluation.
- Binaries — the specific technical artifacts that you can leverage to recreate this emulation in your own environment.
- Infrastructure Setup — the technical details of the infrastructure utilized for each service provider that participated in this Evaluation.
- YARA Rules — signatures to help detect the activity described in the Emulation Plan.
- Caldera Port (to be released) — an automated implementation of the Emulation Plan for use in your own environment.

As demonstrated above, there are many elements to this release. You should be utilizing the amalgam of resources most relevant to you, depending on the level of depth and breadth you are looking to reach with your analysis.

How was this Evaluation Conducted?

The following are goals for this and all our Evaluations:

Empower end-users with objective insights into how to use specific commercial security capabilities to address known adversary behaviors.
Provide transparency around the true capabilities of security capabilities to address known adversary behaviors.
Drive the security vendor community to enhance their capability to address known adversary behaviors.

These goals serve as our north stars when we are designing the methodology of our Evaluations. For this Evaluation, MITRE Engenuity employed a five-week process to evaluate each of the service providers who participated:

Setup Phase (Weeks 1–3) — service providers worked with MITRE Engenuity to accomplish the activities required for execution.
- Upon gaining access to the Microsoft Azure range hosted by MITRE Engenuity, service providers had the opportunity to deploy the tools necessary for the Evaluation.
- In Week 3, the service provider had an opportunity to share how to access the content they would report for this Evaluation. This session was held with MITRE Engenuity, who acted as the pseudo-customer during execution.
- MITRE Engenuity performed multiple checks in Week 3 to ensure that the tools were deployed properly and were compliant with guidelines for this Evaluation.
Execution Phase (Weeks 4–5) — the MITRE Engenuity Red Team executed the Emulation Plan while the service providers worked with the MITRE Engenuity Execution Team.
- The emulation began on Week 4 — within business hours — at an unannounced time and cadence. Although service providers received information on the environment they were monitoring, they did not receive any information on when attacks would be occurring, what techniques were in scope, or what adversary MITRE Engenuity was emulating.
- After the emulation had concluded, the service provider had 24 hours to send the final package of content to MITRE Engenuity for consideration. This package (and the content that was collected by the Execution Team) served as the superset of what could possibly be considered for results.
- In Week 5, MITRE Engenuity presented information on the emulated adversary and associated details to the plan.

Reported vs Unreported

If there is one thing that you take away from this blog, it should be this:

We did not expect (nor believe it is inherently valuable) for each service provider to report each technique/sub-technique evaluated in the Emulation Plan.

Therefore, any analysis that reflects counts or ratios of reported techniques vs the “total count” is antithetical to the purpose of this Evaluation. We are emphasizing this stance in the published results by not providing summaries of these counts as we have previously done in Enterprise Evaluation results. As we dive into the results interpretation, it is incredibly important to understand this distinction.

So, what does qualify as Reported versus Unreported? The threshold we used throughout this Evaluation, across each Emulation Plan sub-step was the presence of context. Did the service provider include enough context to describe the emulated activity? If MITRE Engenuity believed the answer was yes, they were awarded a “Reported Status” on the specific sub-step.

An “Unreported Status” for a specific sub-step could result from the following:

The service provider believed the activity was not of significant importance and reporting to the customer would not add value.
The service provider believed the activity was implied or could be assumed through other content which they provided.
The service provider did not provide additional context to meet our threshold to be considered “Reported.”
The service provider missed or misinterpreted the activity.

Note: We use the “Not Applicable” status when we could not execute the step/sub-step, in which case the results are therefore not applicable to the Evaluation.

How do I Interpret These Results?

Before we get into the fun stuff, let us clarify the role that ATT&CK Evaluations intends on playing in the security space.

ATT&CK Evaluations are a starting point. We hope that these results serve as one of many data points in your commercial capabilities assessment. Especially if you are interested in building a threat-informed defense, ATT&CK Evaluations can help you determine what behaviors are actually addressed.
There are no winners. Any claim by a participant that they “won” is an intrinsically flawed statement. As we walk through our Top 10, it should become clear why we cannot award a winner.
This round was not a product Evaluation. There are inherent differences between service offerings and security products. It is, therefore, extremely important to ensure you do not reduce these complex results to what we typically include in the summaries of our Enterprise Evaluations.
Before starting any analysis of technique coverage, it is important to determine which techniques are most relevant to your organization based on the adversary groups and threats that your organization faces. This information is outside the scope of ATT&CK Evaluations; once you have this list, you can start analyzing the results of those techniques within ATT&CK Evaluations. (You could use a variety of sources to get this information, whether it be the MITRE ATT&CK Framework, public/commercial threat reporting and analysis, or your own cyber threat intelligence to determine this subset of techniques.)
Not all techniques are created equal. A service provider reporting on Process Discovery might not have the same value as a service provider reporting on Credential Dumping due to the severity of the action.
Not all procedures are created equal. Process Discovery (T1057) via Command-Line Interface (T1059) can be detected with most process monitoring. Process Discovery via API (T1106) would need API monitoring. A service provider could have reported one, but not the other.

With that framework in mind, here are the top 10 ways a savvy security practitioner or leader should interpret these results.

Look at the top-level Report Statuses of the service provider you are analyzing. This will serve as your first level of analysis to show you how the service provider performed against the Emulation Plan at the highest level possible.
Understand how the service provider would be relevant to your organization based on current gaps in your security controls. The following factors contribute to why techniques should not be assessed as absolutes, and why counts and ratios are inherently flawed.
a) For instance, the first technique (highlighted in Step 1.A.1) used in the Emulation Plan is Phishing. If a service provider did not report on that technique per the results, it may not be relevant to your organization if you have recently invested in an email security product. Alternatively, if you believe that phishing is a technique that needs better coverage in your organization, you may want to pay special attention to a service provider that reported it in this Evaluation.
b) While a malicious action may have multiple techniques associated with it, you may be primarily concerned with the malicious action being reported at all. For example, an implant communicating with its command-and-control server could be evaluated for Application Layer Protocol, Encrypted Channel, and Non-Standard Port. In this scenario, you may not be concerned with reports on all three techniques, so long as the service provider has reported the command-and-control traffic.
Look at how service providers presented their findings to their customers. Some provided links to alerts within a console that could be used to analyze threats and submit questions. In this scenario, they may be able to use the alert to pivot and find additional adversary behavior. Others used a more narrative-based approach with tickets or emails sent to the customer, explaining what happened without linking to alerts within a console. Every method is valuable in the right light, the trick is to determine which is most resonant with your team.
Determine if the service provider correctly attributed the adversary. This was not a requirement by MITRE Engenuity in this Evaluation, but adversary attribution could be helpful to your organization as you seek to explain your world to non-cybersecurity leadership. We often hear that adversary attribution allows SOC Analysts to summarize the technical aspect of incidents more effectively to their executive leadership.
Find out if the service provider recommended mitigations to the activity they detected. This was also not in scope for this Evaluation, but some service providers included some insight into how this activity should be remediated. Actual mitigations/remediations were not allowed in this Evaluation due to the impact they would have on the execution of our Emulation Plan.
Deeper analytics may be possible based on the data in the archive. For instance, you may be able to check timestamps in images to gauge response time for some results. This was also not stipulated in the MITRE Engenuity methodology, but this information may be possible based on the output of some of the service providers’ results. In future iterations of the Managed Services Evaluation, we are looking for fair and objective ways to include mean time to respond (MTTR) and other such metrics. The archive is the best way to get the content you need to perform these deeper analytics because you will find all the content the service provider offered MITRE Engenuity during the Evaluation, even if it didn’t meet the threshold for a “Reported” status.
Look at the report lengths and determine if they fit the requirements of your team. This is completely preference-based, but the published results provide in-depth insight into each service provider’s approach.
Determine if the language in the reports is relatable for you and your team. Again, this is completely subjective, but ultimately the language used by a services offering must resonate with your team for you to get value out of leveraging the service.
Review the archived information and determine the scope of the Evaluation at large. You will not only be able to understand the volume of information captured during the Execution Phase, but you may also find that the results of the service provider you are analyzing may meet your own threshold for reporting, even if they did not meet MITRE Engenuity’s (rather rigorous) threshold for reporting. By reviewing the archive, you can get a more accurate picture of the scope of engagement with the specific service provider and determine where their level of communication falls on your spectrum of “sufficient” versus “superfluous.”
Look at the releases published by the participants. Once you have reviewed the results and have a few service providers in mind, we recommend reviewing the associated releases published by these service providers regarding this Evaluation. They are often more detailed and dive deeper into the nuances of their own results.

Next Steps

This joint research project was the first of its kind and required a massive collaborative effort from each participant and MITRE Engenuity. I want to take this opportunity to explicitly thank the ATT&CK Evaluations team who has worked diligently over the past year to deliver this release. In the interest of iteratively improving on our process, we will be meeting with our Community Advisory Board members and our participants to understand the good, the bad, and the ugly of this Evaluation with the intention of announcing Round 2 early next calendar year. We love feedback, so please do not hesitate to share it with us through the proper channels. Stay tuned and we appreciate your participation!

A Big Thank You to Our Participants

In this inaugural Evaluation round, we had the privilege of working with the following organizations: Atos, Bitdefender, BlackBerry, BlueVoyant, Critical Start, CrowdStrike, Microsoft, NVISO, OpenText, Palo Alto Networks, Rapid7, Red Canary, SentinelOne, Sophos, Trend Micro, and WithSecure. Since there is inherent risk in anything inaugural, we cannot overstate our appreciation of our participants for working with the MITRE Engenuity team in this joint research project.

Note: Although Trend Micro participated and completed testing for this inaugural round, after an unintended situation, Trend Micro promptly and responsibly shared that their team had found sensitive information to MITRE Engenuity. Based on the agreement between MITRE Engenuity and Trend Micro, MITRE Engenuity did not publish Trend Micro’s results. Please contact Trend Micro directly for more information.