A key performance indicator for infosec organizations

Using probabilistic risk KPIs to direct complex risk engineering efforts.

Ryan McGeehan
Starting Up Security

--

I’ve been helping a few security engineering organizations in the Bay Area experiment with quantifiable risk modeling approaches that use clear language. We’re doing this to subject security teams to better measurement beyond (or in addition to) compliance, checklists, grades, color coding, or maturity models.

It’s difficult to unify broad security work with disparate disciplines under a single quantitative key performance indicator (KPI) that addresses rarely occurring and high impact cybersecurity risks.

We will discuss the potential of probabilistic, risk aware KPIs seeing experimentation at a few large tech companies. First, some background:

Leadership likes to elevate certain KPIs to guide behavior.

The purpose of KPIs is to point a group towards a single direction. They will do this in addition to any mission statements and objectives and perhaps incentivize efforts that see positive gains in that measurement.

Some examples:

Twitter has the Monthly Active User and Timeline Views. Facebook has similar. Others are more dollar based. Cloudflare uses Paying Customers and Dollar-Based Net Retention Rate. eBay has Gross Merchandise Volume. Uber has Trips.

Leadership can choose these as rudimentary proxies for success and for investor awareness. They act as a north star for a massive organization. These sorts of leadership tools create simple and environmental self assessments for employees to test for good ideas or bad ideas.

Problem: The security industry inside tech has not adopted risk based probabilistic and quantitative KPIs that can be stood up by leadership for large and complex engineering organizations.

What is a reliable KPI for a complex security organization?

I want a small amount of generally accepted KPIs that represent the goals of large security organizations. They should be quantitative and probabilistic, somewhat similar company to company, and encompass a variety of information security disciplines that contribute to improving them.

The following reasoning comes by way of my general conversations with CEOs and CISOs in consulting engagements. If you ask what the goals of a security team should be, they might respond with water-cooler generalities like the following.

  • “We don’t want to lose any customers.”
  • “We don’t want to be fined or regulated.”
  • “We don’t want to be a headline.”
  • “We don’t want to be pulled in front of the senate.”
  • “We don’t want to lose customer data or IP.”
  • “We don’t want to harm our customers.”

This is good. They relate closely to simple, probabilistic and risk aware measurements. Here are some:

The probability that within 1 month / quarter / year:

Here are some examples of potential KPIs by forecasted (% belief) measurements.

Many of the below involve some sort of internal incident classification. This is a qualitative measure of how bad an incident is. Some companies use the P0 classification.

  • > N regrettable customer exits resulting from aSEV0.
  • Any party in {set of regulators} formally discusses a SEV0 with us.
  • A SEV0 with >$10M of losses. Choose your own threshold!
  • A {set of bloggers and newspapers} publishes commentary on a SEV0.
  • A SEV0 has confirmed, unauthorized access to customer data.
  • >% of total users impacted by a SEV0 involving an explicitly defined failure.

These can be measured and tested with a variety of subjective and quantitative risk measurement methods. Understanding how risk based KPIs can be useful in engineering contexts is an important goal for me.

Can we trust a probabilistic KPI?

Probability is often an unfamiliar and strange measurement concept for people. It often has surprising property of being both quantitative and subjective and has a long (250 years!) history of academic warfare. The question is whether we can align an engineering organization to a KPI that must be estimated and cannot be exactly measured without literally being omniscient.

I think we can. NASA approves and launches missions similarly with a probability of Loss of Crew, I think we can use probabilistic KPIs too, even with the involvement of game theoretic adversaries, as the intelligence community does.

KPIs have limitations and faults. They’re a north star when used honestly. They are elevated only to guide behavior and help with decision making.

An organization will likely define their own non-probabilistic KPIs in support of the probabilistic goals anyway. That’s fine, as it’s a normal case to see tasks and teams deviate from organization-wide KPIs.

For example: Mean Time to Detection is a potential metric for a security team. Is it a good, single proxy for your entire security program? Probably not. To avoid creating a long list of metrics to act as a proxy for success, organizations usually hone in on short list of KPIs.

We have the opportunity to be less radical about narrow metrics that represent passing efforts or interests as engineering focuses change, and hold steady to more reliable, broad KPIs over time. We should explore how to introduce subjective, probabilistic risk into KPIs.

It’s all models, anyway.

All models eventually fail to represent reality. In this case, quantitative and probabilistic models have a bunch of features we can use that qualitative approaches don’t have. If we can make these approaches easier, then we can orient our organizations around KPIs that model risk quantitatively.

Those features allow us to build expert panels, add, subtract, and prioritize expected values, determine appetite and tolerance, decompose a risk into causes, use frequency data to inform our opinions, measure for error, monte carlo tooling, and more!

Avoiding the organizational pitfalls of risk based KPIs.

It’s really important to note that the formation of any KPI is entirely subjective. It is reflective of leadership. The development of a KPI is a subjective process and there is no objective method to surface one otherwise. There is no objective process that picks KPIs: Humans do it, and the leadership potential of a KPI has limited reach.

This generates problems which we must discuss.

No one fully believes that a KPI can truly capture success as a model. Models can only go so far in capturing a normative concept. This is wonderfully fictionalized in The Wire and models are often pointed out as disingenuous.

So while KPIs are selected carefully, they’re not holy by any means, and risk based KPIs inherit well known problems.

Organizations that make decisions in spite of a KPI are not rule breakers or criminals.

Risk based KPIs will suffer the same weaknesses simply from being quantitative and a desire to game them. An organization will eventually see efforts that increase risk (like an M&A), and that is OK. This is where toxicity could brew, otherwise.

Examples: We celebrate functions that discover risks like red teams, penetration testing, threat intelligence, and hunting. These are good for a company! But may also increase the risk we measure at times as a result, as you build evidence about risks.

That’s ok! We wouldn’t want to limit those efforts because they discover information that would increase a risk based KPI. We’ll need norms that protect measurement and realize that they should be accurate, not minimized.

If norms around honest risk measurement are enforced, along with forecast accountability, we can get to a place where a red team can technically do a good job when not succeeding. Red teams will have some of the most interesting measurement potential in a security organization in a healthy environment.

Increasingly accurate and calibrated risk forecasting is the direction we’d want to pursue as an industry, regardless of whether it’s finding new or fixing old areas of risk.

More data is not necessarily useful for leadership.

The CISO dashboard is the go-to trope for us to beat down and humiliate the role of metrics in our industry. There are infinite things to measure. The “faster horse” feature that every security vendor has to produce is a dashboard that can produce custom reports for every conceivable metric while still somehow making sense to everyone using it.

It’s a problematic thing that CSO leadership desires this… while also very indicative of their needs. Leadership at a (non-security) KPI driven organization can count on one hand the numbers they steadily lead with. If KPIs and a strategy haven’t been decided on to represent security, it can be difficult to present that we are secure.

In lieu of this structure, we pursue many metrics instead.

This brings me to the point: Top level KPIs rarely change. For example, Facebook’s MAU has only seen minor tweaks for a decade. The underlying business units shift and change their KPIs as the business evolves, but the leading KPI sticks around a familiar form for the most part. These are not GAAP metrics. They were chosen by some form of consensus or executive decision.

Security teams often run without risk based KPIs, yet are interested in reducing risk. I’d like to continue exploring why!

What’s next?

I’m working with several companies as they experiment with these sorts of measurement methods and hope to write about feedback, similar to my forecasting feedback. Maybe they’ll work, maybe they won’t!

If you’re exploring the area of risk measurement for engineering organizations, I’d love to hear about it.

Ryan McGeehan writes about security on scrty.io

--

--