How to measure risk with a better OKR.

Ryan McGeehan
Starting Up Security
11 min readMay 21, 2018

--

I’ve become a big fan of the Objective and Key Result (OKR) at companies that take them seriously. I’ll describe an opinionated method that fits within an OKR and measures the reduction (or increase) of a chosen risk. This will inform a team’s decision to reduce or increase engineering efforts to mitigate that risk going forward.

This method is similar to how a meteorologist forecasts the weather.

For deep dives into OKR’s, you can read this, watch this, or read this.

OKR’s are a simple way to express a motivational goal and commit to a short list of measurable outcomes that push a group towards that goal. They sometimes cascade from executive management out to all employees. OKR’s are a common practice among tech companies and many security teams I work with.

Take, for instance:

Objective: Improve authentication from developer laptops into production.

This objective isn’t bad, but many risk measurement opportunities are missed.

We will reduce a rare, impactful risk with a quantifiable method.

These types of risks are normally difficult to measure.

Historical data (never happened) poorly informs our future (could it happen?).

With forecasting and estimation methods, we can measure how probable a future scenario could occur, even if we lack historical data for that scenario in the past. We use the “uncertainty” of a group as a proxy for risk, and we’ll measure it. We’ll manage the cognitive biases associated with forecasts.

OBJECTIVE: Write an objective with a “risk scenario”.

Your objective is to reduce a risk that is expressed in a scenario.

Below is the previously mentioned objective that was written to reduce a risk. It is written with some room for improvement:

Objective: Improve authentication from developer laptops into production.

This isn’t necessarily a bad objective, though it can be improved by rewriting it as a scenario.

Objective: Reduce the risk of “An adversary has accessed production from a developer laptop in Q3.”

They seem similar, right?

  • The major difference is that a scenario is probabilistic. Probabilistic phrases can be forecasted against. Forecasting is well researched, commonly understood (ex: the weather), quantitative, and measures your uncertainty.

Uncertainty is that thing in your brain that makes you shrug at a set of options, or feel strongly about one of them. As it turns out, the uncertainty of a group can be measured in a straightforward way. We’re going to make the uncertainty of experts a proxy for our target of measurement.

  • The minor difference is that the scenario rewards an engineer’s creativity.

For instance, does reducing the amount of developers that require production credentials improve authentication? No, that’s reaching a bit. But it would reduce risk, and the key result is more compatible with the modified objective.That was the better goal, so maybe our key results will be better as a result.

A “risk scenario” objective doesn’t prescribe a solution. It merely sets up a clean forecast. A scenario may do a better job defining a risk as a future event to be avoided.

A good forecastable scenario involves a thoughtful mix of a threat, a vector, an asset, or an impact. You can creatively decide on a specific scope or risk by adding narrowing or widening specificity. A forecast must decide on a concrete timeframe.

KEY RESULTS: Choose milestones or metrics, and commit to a forecast.

First, the easy stuff. Key Results need to be measurable. In Google’s early days, Marisa Meyer said:

“It’s not a key result unless it has a number.”

One simple form of measurement are binary accomplishments: 1 for done, 0 if not done. For instance: “We added the XYZ business application to our Single Sign On platform”. If you did it, you get a “1”!

Another, is to pick a quantitative metrics like “fix X bugs” or “reduce X incidents” or “hire N engineers”. These are necessary, common, and represent project goals and operational metrics. You’re probably used to these. They can make nice key results, too.

However, they don’t really measure a reduction of risk associated with our scenario. Rather, they are a lagging indicator of work performed. This work has created value in mitigating a risk, but you haven’t actually measured a reduction in a risk yet. You simply assume risk is decreasing, due to your efforts.

But by how much? What if it actually increased?

Comparing a security metric versus a measurement of certainty

Traditional security metrics are very useful for their informative value. They inform our uncertainty towards a risk, but do not represent the probabilistic nature of risk, and often do not express the massive uncertainties we can have about a specific scenario.

For instance, I believe that a historical count of vulnerabilities or frequency of regressions does not directly express a risk, but it certainly helps inform my uncertainty towards whether an associated scenario would occur or not as a result of that data.

This is because the value we assign to an individual metric is in constant flux.

Any specific metric may be my most informative data point… up until something replaces it. My judgement would deprecate the former data immediately after hearing new information that screams “oh crap” in the face of the old data, or any fragile model we tried to create for that matter.

Now let’s get to the “hard part”. Let’s make this OKR.

This is actually really easy when you get the hang of it.

An example OKR that is designed to be measured:

As mentioned, we’re going to build this OKR so it is compatible for risk measurement with forecasting and estimation techniques.

Here’s an example OKR for a small AWS security team:

Objective:

Reduce the likelihood of “A production AWS credential was exposed to the public in Q3”.

Key Results:

  1. Commits mentioning AWS_SECRET_KEY show up in the #security slack.
  2. The photobackup pipeline will be moved to an AWS role.
  3. Complete Security Monkey alerting pipeline towards our detection on-call.
  4. Complete a before & after forecast, and CloudTrail hunt.

The first key results (1–3) don’t require discussion. Those are just run-of-the mill engineering work, and you can pick whatever you want. The last Key Result (#4) is what we’ll focus on going forward.

To measure this risk scenario, we will use a forecast panel. This will bolster our ability to measure the OKR’s underlying risk scenario in a probabilistic way.

1. Before you start work: A “baseline” forecast.

Let’s assume this is an OKR for the third quarter of the year. Early in June, a few diverse and trained individuals familiar with the OKR will forecast the probability of the scenario taking place in probabilistic terms (A percentage belief).

Our participants are Monkey(🐵), Unicorn(🦄), Cow (🐮), and Penguin (🐧). We briefly calibrate them to think in probabilistic terms (online training). They have access to whatever metrics, models, post-mortems, consultant auditing, or infrastructure diagrams that are available. It’s all useful and informs their forecast.

The above forecast has a 78% certainty that the CloudTrail hunt will reveal no incident. There’s a 14% certainty that an incident could be discovered, and a 6% certainty that we’ll be in real big trouble.

Now, consider that an answer of 33% from the panel for each category would have indicated total uncertainty, as if they have literally no information or opinion. The scenario could have been written in another language, for instance. That’s not the case here, the participants don’t believe that each option is equal to the other. They think it’s very likely that no incident would take place, given their knowledge of the environment and possible threats.

Thus, this panel is expressing an opinion in probabilistic terms that it’s most likely there won’t be an incident in that timeframe. But, an incident being discovered is not totally out of the question. It happens at a lot of other companies. They must believe there is some small likelihood it could happen.

In fact, a panelist (Monkey 🐵) seems more certain something will be found.

It’s ok that Monkey 🐵 has a differing opinion from the group. We’ll discuss this later — there is no need for the panel to agree!

2. Now do your work, make progress as usual.

The middle of the quarter focuses on meeting your objectives as usual. Just do work.

As our objectives stated, the team builds out alerting, refactor an app to use AWS roles, and deploys Security Monkey. Hopefully they do well and finish them all!

This method has no influence on the day to day work you do. It simply guides work towards a measurable outcome. Attack the risk however you normally would.

3. EOQ. We made progress! Now we compare with the baseline.

We’ve committed to doing two things at the end of the quarter.

First, we exert effort hunting CloudTrail logs with scrutiny, and see if we can shake out any P0 incidents from our investigative efforts.

Second, the panel measures again, except with our uncertainty for next quarter (Q4).

Our panel is armed with new knowledge. This quarter’s progress and the result of the CloudTrail hunt will have changed our opinions on this scenario greatly.

Let’s assume the team succeeded in their other Key Results and the breach assessment came back clean.

We forecast again. Here are the results.

Now we can observe how much certainty the panel has gained, or lost, based on their efforts. In this example, our beliefs trended favorably even further towards certainty (away from 33%). Did our work influence our panel’s certainty? This panel believes so.

In this case, we improved our certainty surrounding this risk. We have a quantitative improvement of 5% in the right direction.

4. Make a leadership decision guided by data.

Now you are armed for effective decision making.

This appears to forecast a breach in one out of every ten quarters.

  • Is that good enough?
  • Do we want to improve that further, or do we staff other risks?
  • What is our acceptable threshold?
  • What amount of effort and resource do we need to surpass it?

Why this approach?

Humans are built to process disparate sources of information, and quickly absorb new information to make decisions.

Throughout the quarter, we’ll undoubtedly gain information that changes our level of certainty about the risks we chose.

This information comes from a lot of places: The hands-on work itself, industry trends, breaches, maybe vulnerability reports in other areas of infrastructure, our own exploit research, a bombshell disclosure tweet, etc.

However, our trust in these sources of information are dynamic. We can’t depend on individual, static metrics to represent our risk, because their decision making value changes quickly. We could use our own certainty as a proxy for these risks, which is known to be measurable, heavily researched, with increasing guidance on improving forecast methods as a measurement instrument.

In fact, expert elicitation is an important factor in Probabilistic Risk Assessments in other industries, like Nuclear, Aerospace, and Environmental.

It’s not new, just new to us.

Insulating against the risks of bias.

Forecasting is dangerous when it is not approached with rigor. Cognitive bias is well researched, and those findings need to be repeated often. There are varying mitigations for the risks of bad forecasting.

Research defends that forecasting can be improved upon when:

  1. Panelists are trained to think probabilistically and about bias.
  2. Panelists are teamed up to combine and smoothen the impact of bias. Diversity in perspective is key!
  3. Panelists are repeatedly confronted with the outcome of their forecasts (Calibration). (Online Training, Good Judgement Open, Confidence Calibration)
  4. Panelists are encouraged to decompose a scenario into more granular parts, and are given transparent access to available data they need to understand them.
  5. A firm understanding of true “Black Swan”. They deceive forecasters.
  6. Don’t try to forecast and mitigate every risk, be ready for an inevitable failure.
  7. Decouple promotion and salary from OKR & forecast results to avoid sandbagging, which is already a problem in employee performance management.

Simply asking panelists for “think fast!” forecasts will surely give you bad results. A rigorous approach has a higher cost of measurement (meetings), but is far easier than methods with ugly risk matrix spreadsheets.

But… I always “assume breach”, so this doesn’t work!

It’s perfectly valid to assume that you’re breached. I would give any organization a very high probability (99%) that somewhere, to any severity, has some sort of adversarial activity on a system they own. That is what “assume breach” means to me.

However, it’s unhealthy to believe that every component of every system is compromised by every adversary at every given time. Rational people, even the FUD slingers, don’t go this far into the deep end.

A deeply pessimistic mind that is rational still leaves room for doubt, just more or less than others. If you have faith that the efforts of individuals will improve risks, then you can measure that reduction of uncertainty in probabilistic terms. A pessimist certainly doesn’t believe their work makes things worse, for instance.

In short, even a pessimistic baseline can be improved upon, and having a couple of pessimists in a panel is actually a very, very good thing.

The future of risk estimation and forecasting

Over the course of many quarters, we can bolster a probabilistic method even further. We can introduce Red Teams, Brier Scores, and industry sampling to guide our forecasts. We can agree on value of data and watch it fluctuate. We can “Chatham House” or anonymize forecasts to share with peer security teams.

We can feed forecast results into Monte Carlo simulations, allowing us to pull lessons and expertise from NASA, Nuclear Licensing, and other fields that are farther than cybersecurity in understanding extreme risks.

There are lots of opportunities for organizations to adopt a risk forecasting practice. Tremendous energy is not necessary to yield good results. Starting small, such as with risk-based OKRs, can demonstrably reduce risk for your organization, and put your organization on a path towards quantitative risk.

Conclusion

OKRs are a common way to guide an engineering team. Creating OKRs that are compatible with estimation and forecasting techniques can allow us to better measure progress in risk reduction.

These methods do not interfere with the “how” a team does their work, it simply measures “how much” may be changing as a result. If you currently have no method to measure risk, then any quantitative method should fare better than what you have. This strategy has minimal impact on engineering practices while aligning the team towards a measurable rate of risk reduction.

Further Reading

Risk Forecasting: A high level presentation on this method.

Simple Risk Analysis: A deep dive on forecasting risk.

Killing Chicken Little: Exploring the limitations and opportunities of risk forecasting.

Decomposing Security Risk Into Scenarios: Breaking down risks into a hierarchy of scenarios, from broad into more granular scenarios.

Thinking Fast and Slow: Nobel prize winning research into human errors of cognition, mostly in the forms of bias.

Superforecasting: Research into how errors of cognition can be mitigated and weaponized into effective forecasting teams.

How to Measure Anything in Cybersecurity Risk: A great source in defense of forecasting as a measurement method. Strong debate that promotes the role of measurement in decision making.

Ryan McGeehan writes about security on Medium.

--

--