Talking about risk with thresholds 🔥

Ryan McGeehan
Starting Up Security
3 min readMar 20, 2023

--

Imagine you encounter a fire in the woods. You’d instinctively decide to do one of two things:

  1. Kick dirt on the fire. or…
  2. Call for help!

Of course, this depends on the size of the fire. What size threshold changes how you’ll act?

This essay is about openly acknowledging these thresholds exist in security risk conversations. It’s a communication tactic that helps manage the work you aren’t doing, and gives others a chance to tweak your plans without trashing them entirely.

Let’s talk about thresholds with endpoint security examples.

Employee endpoints are often managed with some kind of software. While endpoint software can help maintain compliance, backups, and incident readiness… sometimes a system becomes unmanaged for a variety of reasons. Sometimes, many of them are unmanaged!

  • Maybe an OS bug is messing with the management agents.
  • Maybe an employee is using their own device without the agent.
  • Maybe the agent was forcibly removed by an engineer.

A 100% managed device rate is possible at smaller scales, but achieving this at larger scales is difficult. Getting to 100% might not be worth your time, but atrophy will eventually take over. As unmanaged hosts pile up, the risks will too.

So what’s the plan? That’s up to you, but you can follow a structured approach.

The following is a playbook with thresholds in mind, and the benefits are twofold:

  1. You can set these thresholds before they’re exceeded, which makes conversations easier later on.
  2. If a problem already happening now, you’ll have consensus on when to disband the response effort. The sensation of being drawn into endless work isn’t imposed on other stakeholders with a clear escape plan.

Here’s an example in terms of percent of unmanaged laptops / total laptops:

  • Less than 2%: We address other risks.
  • More than 2%: Send friendly emails to employees with unmanaged hosts asking for feedback on any workflow problems may have caused, or, to please begin working from a managed host again.
  • More than (?)%: We treat the problem like an incident. A predetermined escalation path to include stakeholders in IT, Eng, and Security. Commitments are made to either fix the development experience, modify the development process, or otherwise just figure it out.

The specific plans and the numbers are not the point, you can be creative with either the numeric thresholds or the action you take. How you mitigate the issue and when you’ll get started is entirely up to you. The structure of the plan is what’s important to follow, because it has several knobs that can be twisted or negotiated with.

A group can collaborate on what the thresholds will be, or the steps they are matched with. This makes it easier to compromise on more or less aggressive steps by introducing them earlier or later. Instead of having another team scrap your plan entirely, a stakeholder can instead be flexible on when they’ll be enacted, or add in-between steps at more moderate thresholds.

This is applicable outside of endpoint management, of course! Here are some examples to widen your thinking a bit:

  • Hey CTO: When more than 20% of servers don’t meet patch expectations, we want to freeze deployments to those servers.
  • Hey Product: When we hire our 300th engineer, we’d like peer review to be enforced instead of optional for deployment.
  • Hey IT: When we start hiring more than one person per week, we’d like to buy an SSO product that supports more automation.

You may find similar inspiration from reading Google’s opinions on error budgets, or this take on decision making that covers the topic more generally.

Openly share your internal risk thresholds and others will appreciate it. You’ll be more predictable when those fires come along. Others will already know what you’ll be asking of them, and they’ll have more knobs to twist if they need to work with you on the solution.

Magoo writes about security on scrty.io

--

--