Inaugural ATT&CK Evaluations for ICS Release: TRITON

Otis Alexander
MITRE-Engenuity
Published in
10 min readJul 19, 2021

The results for our first round of MITRE Engenuity ATT&CK® Evaluations for Industrial Control Systems (ICS) are now available on the ATT&CK Evaluations website. This evaluation emulated behaviors inspired by the events associated with the TRITON malware attack against a petrochemical facility in Saudi Arabia and is ATT&CK Evaluations first major expansion into a new evaluation domain. Five vendors in the ICS detection market participated in this initial round of evaluations: Armis, Claroty, Dragos, the Institute for Information Industry, and Microsoft. We are thrilled to have worked with this group of participants and are very excited to share the output with the global community.

Given that this is a new offering, with a very different technology type, we wanted to take this opportunity to address some questions that we think are important and have heard voiced in the community. While some readers might be familiar with our ATT&CK Evaluations for Enterprise, we hope this post provides transparency around this new evaluation type to better understand the process, rationale, and results.

What Are ICS Evaluations and Why Are They Important?

Products in the ICS detection market often use different approaches to detect attacks against ICS. For instance, some may heavily leverage behavioral analytics while others rely more on anomaly detection. This can make it difficult for end-users to gain objective insight into what the true capabilities of products are and how to use them. Short of rigorously testing each product on their own, many organizations have few avenues for truly understanding how products perform outside of relying on information from demos and marketing materials.

There are various approaches to assessing whether a product in this space is right for your organization: you could look at the product’s ICS protocol coverage, its ease of use and interoperability with other products you own, its cost and even product maturity. These can all be important factors in choosing a product but do not speak directly to what it is able to detect and how the detection occurs.

Two important questions that consumers of these products should be asking are what tactics, techniques, and procedures (TTP) can be detected, and how much context do these detections provide to the analyst to enable them to understand the events they are seeing. These questions are what ATT&CK Evaluations seeks to address.

The goal is to foster a community that is better informed about the true performance of the evaluated products’ capabilities to detect known adversary behavior. We accomplish this by providing publicly available results at the end of every evaluation. As with our Enterprise evaluations, our process centers around building and executing an adversary emulation plan for a known threat, though, in this case we use a representative ICS environment to conduct our evaluation. During a collaborative evaluation, the vendor acts as the defender, while we play the roles of adversary and proctor. In addition to this, we act as a guide to assist the vendor, helping them understand the adversary actions, and ensuring that they are capturing the right behaviors.

Evaluation Environment

What Happens to Results and How to Analyze Them?

After data has been collected and a feedback period, the results are released to the public. In doing this, we guarantee that our work is not limited to collaboration with specific vendors, but also enables end users to make more informed decisions on which products to buy and how to leverage the ones they already have.

Given the multitude of ways to analyze the captured data, we do not declare a winner, as there is no single way of looking at it. We released a blog last week on this topic in the context of our Enterprise Evaluations, but much of it can apply to these evaluations and is worth a read.

When reviewing the results of an evaluation in search of a solution, some things to consider are what techniques should have detections, and whether those detections are usable for you. Keep in mind, there are also a number of other details that are out of the scope of this evaluation (e.g., environmental needs, false positives, etc.), which are still worthy of consideration when selecting a product that best suits your needs.

The ICS Evaluation serves as a rich data set that helps provide visibility into how products detect behaviors as described by ATT&CK for ICS. To facilitate the examination of this data we have a number of tools, such as the participant comparison tool and technique comparison tool, that allow you to conduct your own side-by-side analysis of vendors.

Why Are We Focusing on Detection Solutions?

One reason we chose to evaluate this class of security products is so that we could leverage the existing ATT&CK Evaluation methodologies that have been proven over multiple rounds of Enterprise Evaluations. The Enterprise Evaluations currently focus on a fairly well-established endpoint detection market (e.g., EDR, EPP, XDR, etc.). In the ICS space, usage of these types of solutions is not as prevalent. However, the passive and active detection solutions that collect data over the network serve a similar purpose to their Enterprise counterparts and are a fairly mature offering in the ICS space. Therefore, many of the methods used in Enterprise Evaluations are directly transferable to ICS Evaluations.

Another reason that we chose to look at the detection solution space is because security controls that fall into the categories of network segmentation/segregation, boundary protection, and firewalls, rely heavily on configuration-based decisions by the network owner. This makes it much harder to evaluate these types of products and provide consistent and valuable results to the community. We are always looking for ways in which we can effectively evaluate these technologies, and welcome feedback from the community.

Our evaluation of detection solutions in the ICS space does not represent a recommendation for every organization to leverage one of these products. Each organization is at a different level of maturity in regard to having and following a security program. We do, however, see the benefit of organizations maintaining visibility in their environments and maintaining a record of events that have happened for detection, incident response and forensic purposes. Hopefully through these evaluations, we help shine a light on these solutions, and as organizations mature, these products (and others like them) take a more substantial role in securing their networks.

Why Did We Pick the TRITON Attack for the First Round?

Recently, there has been a lot of attention placed on the impact that ransomware has had on critical infrastructure. While these attacks are both very concerning and provide valuable lessons about the reliance of operational technology (OT) operations on business systems, they do not often happen against the ICS systems themselves. In most cases, OT operations are affected tangentially through a break in business continuity, or through the proactive shutdown of processes out of an abundance of caution. It is not clear whether many of the detection solutions would be in the position to detect these types of attacks against business systems.

The TRITON attack, however, falls squarely into the realm of adversary activity that should be detected by a solution that is focused on the ICS technology domain. It is one of a limited number of publicly identified malware families that have targeted ICS. TRITON is consistent with Stuxnet and Industroyer in that it had the capability to prevent safety and protection mechanisms from executing their intended function, resulting in a physical impact.

What is truly interesting about the incident response associated with the 2017 TRITON malware attack against the Saudi Arabian petrochemical company is the large amount of time it took for responders to realize that it was a malicious attack. Our belief is that the presence of a detection solution focused on the ICS space would have sped up and informed response. Even if you do not have a SOC capability to support the analysis of data in real-time, if properly configured and deployed, a detection solution can at the very least serve as a forensic record that could greatly aid in incident response. A key question for the community is how many incidents which were attributed to human error, hardware failures and faults, were actually manifestations of adversary activity. Without at least a reliable forensic record, we will never know.

For these reasons, we believe that the TRITON attack is a worthy real-world incident to emulate for round one of ATT&CK Evaluations for ICS.

What Was and What Wasn’t Evaluated

Each vendor solution was evaluated to see what it was able to detect and how the detection occurred. It is worth stating that evaluating how a detection occurred is not completely straight forward. From the vendor’s perspective, the goal of this evaluation is to detect the adversary emulation executed against the evaluation environment, and they each will inevitably have their own way of accomplishing this task. The logic behind the detection is not always shared with the end-user and in many instances can be viewed as a trade secret by the vendors. Therefore, when we look at how a detection occurred, we instead focus on the context the detection provides to the analyst. Each detection associated with the execution of adversary emulation steps that is provided to us is assigned detection categories. These represent the amount of context provided to the analyst. A detailed explanation of what these detection categories mean, and examples of how they are used, can be found here.

Detection Categories Used for the ICS Evaluation

False positive rate, false negative rate and detection rate are important metrics for evaluating the performance of a detection capability. These metrics can be invaluable in helping to highlight the tradeoffs between various solutions. One solution may be very “noisy” and classify normal actions as malicious while another solution may, on the other hand, miss unacceptable amounts of malicious activities. These metrics, while important, were not collected as part of the results of this evaluation. One of the key reasons we decided not to include them is because it is hard to objectively calculate these metrics in a consistent way across multiple vendor solutions. What one vendor calls an alert, another may call a notification or even a deviation. The complexity increases as you include additional metrics such as severity to the mix. Therefore, although we acknowledge that these metrics are valuable in comparing capabilities and are important to overall detection research, we opted to not include them in this round’s results.

What was Unique About the ICS Evaluations?

While we retain key parts of the process, an ICS Evaluation requires a unique set of requirements, slightly different than the Enterprise Evaluation.

Personnel responsible for supporting and maintaining ICS systems have traditionally been hesitant, if not outright opposed to using endpoint detection and protection solutions on platforms that are supporting critical ICS applications. Installing agents such as these on critical systems may increase the risk of accidental disruption to operations. In addition, while deploying agents on embedded ICS endpoints has been explored in various venues, most prevalently the research community, there is not a mature product space associated with this type of technology. The same concerns about the risk of accidental disruption, or worse, are amplified at this level. Additionally, the inclusion of technologies like this could invalidate a platform’s certification.

Because of these and various other reasons, the ICS detection market has mainly focused on passive network detection technologies. These products can collect network data passively and, in some cases, offer active capabilities to query devices in a sanctioned way. An additional feature of some of these products is the capability to collect host-based data from a client. This feature allows products in this space to leverage some of the same data that endpoint detection solutions rely upon.

These factors most notably affect the placement of the security solutions in relation to the evaluation environment, mechanisms for the transfer of data sources to the security solutions and our ability to test vendors in parallel against the same environment. The vendors do not have any direct access to the environment during the learning and execution phases of the evaluation. The one exception to this rule is for vendors with an approved active polling capability. These features were tested outside of the execution phase as not to taint the network data being collected by the other participants. Since network data is collected passively and host-based is transferred to participants, we were able to test all the security solutions in parallel over the five-day execution period.

A defining feature of many detection products in the ICS space is the ability to create a baseline of “normal” network traffic and use this baseline to alert on deviations from an established norm. Because of this, we allotted a three-week time span before the execution phase of the Evaluations for the security solutions to “learn” the environment and create a baseline to aid in detection. During this learning phase we kept user activity to a minimum but performed various engineering actions to make sure the solutions captured traffic and events associated with standard activities seen in these environments.

Lastly, it is worth noting that although TRITON was leveraged against Triconex controllers, we chose to build our environment using Allen Bradley controllers and software. This forced us to implement TRITON-like behaviors using a different protocol than the one that was used in the real attack. This really highlights the flexibility of adversary emulation, where we seek to avoid getting stuck in one implementation of malware and instead focus on its underlying behaviors. This also allows us to broaden the applicability of our evaluation to other platforms in the ICS technology domain.

More to Come

We have much more to say on these evaluations and will be releasing additional content soon to give a more in-depth analysis of the results and our process. In the meantime, we hope you have success exploring the new TRITON results. Remember, there is no winner. There is no one way of looking at the data, or golden metric. Each vendor has their own strengths, weaknesses, and story to tell.

If you have any feedback or are interested in participating in a future ATT&CK Evaluation, please reach out to evals@mitre-engenuity.org.

© 2021 MITRE Engenuity. Approved for Public Release. Document number AT0019.

--

--