The Best Way to Detect Threats In the Cloud?

Anton Chuvakin
Anton on Security
Published in
6 min readJul 20, 2022


Let’s continue our fun conversation on threat detection in the cloud that we started in “Who Does What In Cloud Threat Detection?” and “How to Think about Threat Detection in the Cloud” and continued somewhat in “Detection as Code? No, Detection as COOKING!” and “Does the World Need Cloud Detection and Response (CDR)?”

Many organizations, and industry at large, still have challenges with defining what good looks like in threat detection in general. Here we have an more narrow need to define what good looks like in cloud threat detection. So we want to formulate rough and imperfect dimensions of good threat detection for public cloud computing.

Rough Guidelines for Good Cloud Threat Detection

Certainly, good threat detection must aim at realistic threats that affect public cloud environments. Many organizations are hugely challenged here, and recent data indicates that cloud environments face a befuddling mix of old and new threats (from 1980s-style password guessing to 2020s container escapes). Simply replicating data center threat detection controls that worked well on-premise to the cloud is not likely to lead to greatness — or even goodness in detection. With cloud, come new layers of security exposure and control (and huge security advantages), so including those in detection planning is crucial to success.

Naturally, good threat detection in the cloud takes into account cloud computing properties such as scaling, ephemeral nature, APIs, and focuses on the criticality of the identity layer in the cloud.

While they are organizations with minimal cloud footprints, it is expected that cloud threat detection controls will have to deliver higher effectiveness in regards to “false positives”, given the higher — and growing — telemetry volumes. After all, 1% false positives for megabytes of data may have been acceptable. The same percentages would not work for automated detection controls on petabytes of data, especially in our world where humans skilled in detection are scarce.

As we mentioned many times, the source data types for detection in the cloud are changing or at least their importance is being rebalanced. In the long run, network focused detection controls that rely on packet capture and traffic metadata will likely decrease in importance due to growth of encryption and bandwidth. This means that ultimately logs, observability data and various detection techniques that rely on backplane access by the cloud provider will become more important.

Similarly, the role of endpoint telemetry — such as EDR — will wane as organizations adopt cloud-native technologies and endpoints start to disappear. Today, many organizations deploy agents in cloud environments, at least on virtual machines and sometimes on containers. These agents are expected to decline in popularity as more organizations shift to microservices and even more SaaS. Even where servers remain, we predict growth in agentless approaches given their lower operational overhead for both the server workloads themselves and for security teams.

While we are on the topic of doing detection from the cloud provider backplane, the question of durability of detection controls against attacker interference comes up. Agent-based controls that are deployed in the compromised environment — such as EDR — always had this as a potential risk. In the cloud, we do have a chance to instrument detection controls with a much smaller chance of attacker interference — like VMTD

Naturally, there is also a degree of rebalancing between detection controls that are seen as auxiliary vs those that are seen as primary compared to on premise environments. For example, everybody highlights the role of identity in the cloud — thus people are treating identity activity logs as a primary source for threat detection, while, say, flow data becomes auxiliary for many.

Finally, while I want detections to always be transparent (and for ML — at least explainable), others disagree. Let’s just find the angle everybody seemed to agree on: detections better be trusted by whoever needs to use them.

Pros/cons on Where Cloud Threat Detection is Done

So, let’s turn this into pros and cons for where we are doing detection, using the framework from this post.

What advantage does a cloud provider (CSP) have in developing / operating detection systems and content?

  • Processing proximity: CSP has no network egress/ backhaul costs for raw data and signals into detection systems. This provides an economic benefit and latency and durability benefits: fewer hops in a system means faster end to end detection and fewer dropped signals and unexpected costs.
  • Better systems integration: CSP has unique opportunity to plug into related and supporting infrastructure for detecting threats with signals that are not otherwise externalized for privacy/security/performance reasons
  • Advanced warning/early detection: CSP can work with internal teams to develop detection of embargoed vulnerabilities and issues; it can also rely on cloud provider broad threat awareness.
  • Access to upstream teams at the cloud provider: CSP threat detection teams can work with internal teams to modify systems to produce additional signals needed for detection, such as adding annotations to a log or embedding an LSM in a Linux distribution
  • Higher resilience vs attackers and detection control invisibility; CSP instrumentation is invisible to both customer developers and to adversaries.

What advantages does a third party vendor have in developing/operating detection systems and content?

  • Broader applicability: a CSP developing for their cloud has a very particularized piece of software that’s adapted to the quirks of that cloud, but it requires a mindset shift to work on third party technologies. A 3rd party has broader focus view by developing for all clouds.
  • Better multi-cloud coverage: it is widely accepted that a more unified view of detection activities and better coverage across CSPs is better achieved by a 3rd party, not one of the CSPs (however, we are very far from “one detection rule for multiple clouds” situation anyway)
  • Perceived trust advantage via separation of duty: some cloud users prefer to separate the environment operator from the detection operator as they consider this approach to be more trustworthy in principle.
  • Superior agility: some believe that a smaller, focused startup will develop detections faster and will address customer vertical needs with more agility
  • Less reliability pressure: it’s accepted, if not acceptable, that third party tools may introduce performance issues. It is unacceptable for a CSP to cause a performance issue in their own garden.

What advantages does a customer have in developing/operating detection systems and content?

  • Precise threat model: naturally, customers know their threats best; in theory, this means they can develop the best detections (in practice, many lack detection engineering skills needed)
  • Business / vertical knowledge: similarly, customers also possess superior knowledge of their business, assets, and industry specifics. For some threats, where the knowledge of the environment is more important the the threat knowledge, customer-created or customized detections are the only way to go.

Now, a careful reader of our post would get a sense that we are advocating an idea that the best threat detection in the cloud will have to be built by the cloud provider itself, essentially a cloud platform owner. To some, this would be similar to saying that the OS vendor will build the best detection for a particular operating system because “it is their OS” and “they know it best.” Naturally, we all know that this is not really true in real life. So why would it be true in the cloud?

This time it really is different: when people built our OS and hardware stacks (Unix development started in 1969, Windows NT in 1989, our x86 instruction set dates to the 70s) we were a lot less concerned about security than we are today. As a cloud provider, we designed security into our foundations much earlier armed with the knowledge that we had a generational shift in computing foundations, and motivated by decades of security failures, and a clean sheet generational opportunity to change security outcomes.

Thus the argument hinges not on “platform ownership”, but on the fact that security thinking was there during day 1, perhaps even day “-100” when the systems were still being conceived and designed. This matters!

So, our conclusion: we do hypothesize that ultimately the CSP route will naturally win as the best place to get cloud detections. Our impression is that being close to securely-designed cloud platform and having superior platform (and threat) knowledge factors will overtake the others. But, definitely curious to see how this evolves…

Note: written jointly with Tim Peacock and will eventually appear on a GCP Security Blog.

Related blogs: