Can we quantify cybersecurity risk?

Veeral Patel
6 min readJul 24, 2020

--

In March of this year, I saw on Hacker News that Netflix had open sourced a tool called riskquant.

I set out to figure out whether cybersecurity risk was realistically quantifiable, as the project aimed.

It seemed too good to be true: If we could quantify risk, security teams could prioritize risks, assess controls, get budget approvals, and communicate posture much more easily. But I wanted to find out.

A thought experiment

I started with the belief that if you could assign a number, from 0 to 100, to describe how secure an organization was, where 0 is “certain to be compromised” and 100 is “impossible to be compromised”, you’d get these benefits:

  • CISOs would have to worry less about getting hacked, provided their number was high enough
  • CISOs could easily get budget for security initiatives because they could demonstrate that these initiatives would increase this magic number
  • CISOs could easily choose what security initiatives to work on, based on which ones increase this magic number the most (relative to the time/money required)
  • If you have a high magic number, you could post it online to show customers how secure you are. Competitors would likely follow, kicking off a competition where everyone becomes more secure

I realize that formulating such a number is impossible. Different organizations have different sets of risks and therefore they have different definitions of the word “compromised”.

And any organization has a combinatorial explosion of possible attack paths, far more than any tool can deduce, not to mention that organizations have many attack paths they themselves are unaware of (eg, those created by zero day vulnerabilities).

Three requirements

Risk quantification, or which attempts to express risks using dollar values instead of high/medium/low labels, is a real field, though.

For example, you may express the risk of an employee laptop being infected with ransomware:

  • qualitatively: “the risk is high”
  • quantitatively: “there’s a 20% chance of ≥ $1000 in damages in the next seven days due to this risk”. (This is done with loss exceedance curves)

My goal was to see if I could build a piece of software that helped people quantify their risks.

Such a piece of software has three requirements, in my view:

  • It needs to solve a real pain point
  • It must be easy to use
  • The risk values it outputs must be reasonably accurate

A real pain point

I mentioned a couple potential benefits of risk quantification above. Potential is the operative word — these were the hypotheses of someone who’d never been a CISO, I realized, and didn’t have that much experience in the security industry either.

So I talked to two sets of people:

  • Risk quant enthusiasts, who I met by joining SIRA’s mailing list
  • CISOs, who I met via introductions

I had the privilege of talking to the current/former CISOs of Salesforce, Dropbox, Stanford University, and other organizations. Here’s what I learned:

  • Some CISOs, especially those that reported to non-technical people like the CFO, said they did have trouble justifying large expenses, like buying a SIEM. But I realized that this wasn’t a pain point that they encountered very often and I didn’t want to create software only CISOs would use.
  • Prioritization also wasn’t a problem, as many CISOs had lots of low hanging fruit and/or could cope fine using a maturity model

This being said, I believe that if computing a numerical (dollar) value for a risk was as easy as computing a qualitative value, most people would prefer the numerical value. It’s more precise and more convenient when discussing a risk with someone else.

Easy to use and reasonably accurate

Unfortunately, I couldn’t find a way to make expressing risks quantitatively as easy as expressing them qualitatively. Not even close. But let’s take a step back.

Risk formulas

Remember, a risk = its frequency * its impact.

Frequency of an attack = probability of attempt * probability of success

Impact = primary loss (eg, if the data is intellectual property, the value of that IP) + secondary loss (eg, legal fees, reputation damage)

Steps in quantifying risks

To quantify all of an organization’s risks, you must:

  1. identify all the risks
  2. compute the impact of each risk
  3. compute the frequency of each risk

Let’s assume the organization has a list of risks written somewhere for (1), like from past risk assessments. Furthermore, let’s focus our discussion of (2) and (3) to the data breach risk, which seems to be the top risk companies want to quantify.

Identifying data repositories

First of all, you’re going to need to identify all the repositories of data in an organization. This is non-trivial. Think about how many types of “data” a company has. There’s customer data, there’s employee data, there’s financial data, there’s intellectual property. Data’s stored in S3 buckets, in databases, in email inboxes.

An organization might not even know where all of its data is stored, let alone be able to quantify the risk of it being breached.

Computing impact

But just for the sake of discussion, we could in theory write a tool which crawls a company’s databases, storage buckets, and other cloud based data repositories using public cloud APIs. (Or use something like Open Raven). This tool could then try to categorize each data repository as PII, financial, IP, etc., with the ability for a human to override the categorizations.

Based on each categorization, you could try to estimate the impact if a particular data repository was compromised. If a database storing 1000 rows of PII data is compromised, the tool might infer the impact of the database being compromised is 1000 * $10 = $10,000.

Here, $10 is computed by finding the average historical cost of a PII breach. Of course, a human should be able to override the number of records and cost per record that the tool came up with.

An incident database

To generalize this across all types of risks, I suppose you could have a database of incidents, everything from an account takeover to a successful extortion, with the dollar impact of the incident.

Then, you could query for all the incidents of a particular type that occurred to a company in the same industry as you, with a similar number of employees, within the past three months — and then average the dollar impact from all those incidents.

Unfortunately, this doesn’t exist and I don’t see this existing in the next few years. It’s a cultural issue; companies don’t want to share this data publicly. That’s not to say the culture is impossible to change :)

Limitations

Keep in mind our limitations so far:

  • only works for the risk of a data breach
  • only for data that’s stored in the cloud (users would need to log the impact of the breach of data stored elsewhere manually)
  • the tool has no understanding of how much a data repository is worth to an organization. it’s just guessing

Computing frequency

Trying to estimate the impact of a data breach is possible to do programmatically. The tool described will be difficult to build and has obvious limitations, but it seems possible to build.

Unfortunately, I couldn’t figure out a way to compute the frequency of an attacker compromising a particular data repository, like a database or S3 bucket, in the cloud.

The easiest thing I came up with was a tool that analyzed a cloud environment and built an attack graph. Then, someone could remove edges from the graph that aren’t feasible, and then assign a probability to each edge. Finally, you could run an agent based simulation to find the probability of someone getting from an entrypoint to the target node.

I said the “easiest thing”, but this doesn’t seem easy — not for the people that need to develop it nor for the people that need to use it.

Parting thoughts

In summary, I couldn’t find a way to make risk quantification easy or reasonably accurate. I also couldn’t find a pain point that would compel someone to put in the effort to use the tools I described above.

And keep in mind the tools I mentioned only work for the risk of a data breach. This is a significant limitation.

For someone looking to contribute to risk quantification, here’s what I’d suggest:

  • Learn what RiskLens’s value prop is. Several people I talked to suggested I try to create a competitor to RiskLens, and RiskLens seems to have signed many great logos.
  • Look into other fields where risk is quantified. Based on a cursory glance, risk is quantified in both finance and insurance, but those two fields have lots of relevant data, unlike cybersecurity
  • See if quantifying the risk of a data breach enough is valuable. It may be sufficient.
  • Try to find an easy way to compute frequency programmatically. I think you can estimate impact pretty well (based on my approach and also just looking at the financial impact of data breaches from historical data).
  • Identify a strong value proposition that you need risk quantification to solve. Maybe one of the benefits I listed is useful for a segment of customers. Or there’s another benefit that I haven’t thought of. Or maybe risk quantification is well-suited for a narrow use case (like Kenna is doing with vulnerability prioritization).

--

--

Veeral Patel

software engineer at OpsRamp, ex-incident response consultant at Mandiant