My post “Why is Threat Detection Hard?” proved to be one of the most popular in recent history of my new blog. In this post, I wanted to explore a seemingly obvious, while surprisingly fascinating aspect of detection: uncertainty.
Uncertainty? Are you sure, Anton? :-)
Let’s start our journey with exploring the classic fallacy, “if you can detect [the threat], why can’t you prevent it?” Back in 2016, I hit this point really hard here, and notice the first argument I made there. Threat detection, if done well, carries uncertainty, inherently and by design.
OK, you want to argue? Sure! Suppose you are one of those people who only wants to deploy rules / signatures that have exactly ZERO “false positives” hence removing a big chunk of the uncertainty, if not all of it. In some cases, this may be a hangover from having opaque vendor detections where a team couldn’t actually determine the logic of a signature beyond a cryptic string message.
So, you are going to detect the use of Mimikatz by looking for a string “mimikatz” (yes, really!) in a Windows event ID 5805, for example. What are the chances of this rule triggering from benign activities? None, except during a pentest and you do want to detect that. Now, do you think this is a good rule? Well, this rule works, but it leaves the entire universe of related and similar events out. You can of course write more and more fragile, narrow rules to cover all of them, but it is pretty obvious that you will lose.
This also reminds us that if “low false positives” becomes the central criteria for detection content development, the defender ultimately loses. As an aside, this also reminds me of the point recently made in “security uncanny valley” — if you seek to detect a few simple things, you may be seen as more successful compared to handling big uncertain issues (e.g. your time to triage an alert will be better if you do narrow, fragile rules only and so will your “false positive” rates)
Furthermore, if you have to have detection certainty, you will have to avoid using any of the detection methods invented since the 1980s and perhaps even some invented IN the 1980s. You can’t have ML or any anomaly detection, for that matter, because these are never 100% certain. You cannot use broad pattern rules and cannot even use threat intelligence, because threat intelligence is not signatures. All you will ever have would be huge piles of narrow fragile rules. And a lot of missed attacks. By the way, “fragile” here means that detection content is very easy to break when something in the attack changes (some would say that fragile detection is worse than none at all, because lots of fragile rules give the defenders false hopes of detection measurability and effectiveness).
There are multiple sources of uncertainty in detection. Some come from the approach (such as algorithms or machine learning that are non-deterministic), and others come from the fact that we sometimes detect malicious intent, not the activity (e.g. is this data upload malicious or not?). Yet another batch comes from incomplete information such as context data not available at detection time and/or place (e.g. endpoint state may not be available to a network detection tool). As a quick sidenote, of course we don’t detect intent directly because capturing intent in raw technical detail is not possible, but we do apply our judgement about the intent when we detect threats (this perhaps deserves a separate blog post…)
OK, so now that we established that good detection carries uncertainty, what do we do? How do we operationally deal with detection uncertainty? In other words, how do we gain temporary and localized respite of certainty when we really need it?!
There are a few ways:
- IMPROVE ALERT TRIAGE — From the same post, you recall my emphasis on alert triage. Triage is that “magical” process where you gain certainty about a particular detection signal under your particular circumstances. However, alert triage has been an uphill battle for many years for many organizations due to shortage of skills, needed context data, investigative tools, etc. And some who outsourced this ended up having to triage signals from the outsourcer…
- USE MULTI-STAGE DETECTION — as my former co-author said, “threat detection is a multi-stage process” where the first stage produces a noisy signal and then later stages gather additional context and details to gain more certainty (so better “detection in depth”?). This is typically accomplished via SOAR playbooks that either lead to the final stage giving you very high certainty (so that you can remediate) or merely high certainty (so that a human will have an easier time deciding what to do or a less skilled human will have a chance of deciding correctly what to do). Admittedly, you can consider this triage automation, but it is useful to logically separate this sometimes.
- SPLIT BAD FROM INTERESTING — and then handle them differently. One can simply and explicitly run two types of detection content: “bad” (but fragile) that will have near-prevention-grade certainty and “interesting” that will be used as hunting clues, first stage in a multi-stage process, input into elaborate triage playbooks, etc. Note that the first can be triaged by the junior analysts (or automation), while the second needs solid expertise.
To conclude, embrace, don’t fear, uncertainty in threat detection. Use some of the approaches discussed here rather than strive for 100% clear signal from detection tools. Detect bad, but also detect interesting (with different processes attached to the detection signals). If you only detect certain badness, you will miss a lot of badness for which you have no certain signal. Prepare that humans will be involved in detection signal triage, whether machines (SIEM, SOAR, whatever) help them or not, because humans are much better with uncertainty today.
Thanks again to Brandon Levene for his insightful comments.
Related blog posts: