In Defense of a Target Acceptable Global False Positive Rate for Threat Detections

Steve Roesser
6 min readMay 2, 2023

--

Every single organization today that is monitoring alerts has a false positive rate. The rate could be 20% or 50% or 80% but for some reason when I ask “what is considered a good false positive rate?” there doesn’t seem to be any answers. There are no industry standards to help guide us. How can we possibly manage our threat detections if we don’t define what good and bad is? How do we know when it’s time to review a detection rule?

Let’s level set with some controls for this discussion:

  • This guidance applies to the quality of threat detection(s) and not the entire detection capability of an organization. That is a different beast
  • Sample Size: To judge a single threat detection you need a reasonable amount of data. Currently I like to use 30 as my minimum bench mark before evaluating a threat detection. Because we can’t wait for 1,000 alerts to fire before we make assertions
  • Threat Detections have 2 outcomes: True Positive (Malicious) or False Positive (Benign)
  • True Positive is defined as correctly detects badness and does not have to do with if the alert was actionable or not. Aka: if a security control blocked a real attack it is still a TP detection
  • You have a customer (Tier 1, SOC, IR team, etc) and whomever your customer is will judge your threat detection(s).
  • The goal of a Threat Detection Team is to maximize true positive detections, minimize false positives and avoid any false negatives.

Now let’s dive into some of the reasons for why some seem to be against a target acceptable global false positive rate (until they finish this blog 😁)

The Saved their bacon one time argument:

I do agree that you can have good detections with high false positive rate but only if the sample size is low. Consider the detection described above that has a 90% false alarm rate but 10% of the time it detects that is highly actionable. Is this good? This entirely depends on volume. If you have a volume of < 30 then this could be a fine rule to have enabled in your environment. But what if it fires 100 times a month? What about 1,000 times a month? Eventually you would rethink that rule and that is the point. So, with low volume of a single threat detection, 10% TP can be totally acceptable. But in reality, it’s not a good argument against using false positive rates to help us make decisions on rules. At the macro level you should still aim to have your false positive rate within the target acceptable global percentage.

Team Size/Capability/Strategy/etc. should not matter

If you have 5 SOC analysts vs 50 SOC analysts, does this change the amount of benign alerts I should send them? I don’t want to send my SOC more benign alerts because there are more SOC analysts to look at them, that’s not a good use of budget. Yes, there will always be many variables in cybersecurity causing creating standards to be difficult. However, we all should have the same goal of finding badness and we all produce work for some customer. Therefore I feel that having some standard benchmark or goal would be beneficial regardless of what team you are on, organizational size or maturity. If we are defining what a good set of threat detections look like, then the other variables don't change that definition.

Let’s not kill our SOC’s

There were a couple of comments like “Just below the amount that makes your analyst leave” or this “assuming in this context that the most affected team is the SOC or staff responsible for alert triage, it roughly means managing FP rate up to a point to keep them engaged” that seems to imply that the answer for a target FP% is to almost drive your SOC analysts insane. I do not recommend this approach…Your FP percentage should not be driven by how many alerts you send to your customer. This is not an appropriate measure of the quality of your detection rules, it’s just a measure of the patience of the people you hired to triage these alerts.

You cannot shoot for 0% false positives

The obvious and easiest initial answer is why not shoot for no false positives! I don’t think any organization considers this feasible. You are very likely opening yourself up to false negatives and potentially missing things. If the threshold is set too high or the rules are too strict, some threats may not meet the criteria and will be missed, resulting in false negatives. Therefore as you get closer to 100% TP detections your risk is also going up proportionally. We need to find the right balance.

The length of investigation should not change the target false positive rate

Another comment I received was this:

“The other thing to consider is how long does it take to determine the detection is a FP. If it’s an easy automated triage decision then a higher FP rate is acceptable.”

I do agree and advise tracking ticket times on a per rule basis to understand how that is effecting your customer. Both very low and very high triage times can have an effect on your team. But let’s imagine an alert takes 60 seconds to triage and determine if the alert is actionable or not. Now an analyst could theoretically with no breaks work 480 of those in a normal 8 hour work day. With a >70% false positive rate they are triaging at least 336 false positives a day. That analysts will become completely numb to those alerts and quality will likely suffer. Overtime, they will not enjoy doing this work. I think this logic is another way to kill your SOC long term.

My Answer:

Anything below 30% TP rate should be reviewed. It does not mean that the only option is to disable that rule. The rule can be modified/tuned with additional parameters to reduce the noise, it can also be combined with another rule to increase the likelihood of success. However, anything below this threshold will likely start to burn out your customers over time. At 30%-40% True positive rate that is an acceptable rule and then from 40% to 65% is really the sweet spot. Anything above 65% and you start to run an uncomfortable level of risk with your detection engineering program.

In my experience, this is how we define success of a rule (with a reasonable sample size) or the whole set of rules when defending a Fortune 500 company. I welcome any feedback and criticism on these percentages as they are just based on my own experience. I believe we as an industry have to make an attempt to draw a line in the sand somewhere and define what is good vs bad. Standards are helpful even if they don't apply to every single situation. Just saying “it depends” doesn’t push our industry forward or helps us grow. There will always be edge cases.

You will always need to look at the specific requirements, risk tolerance and needs of your own organization but that should not prevent us from setting a standard as an industry to assist threat detection teams in trying to understand the quality of their work and the impact it is causing on their customers.

--

--