Decomposing security risk into scenarios

How to express risks with well understood tabletop phrasing.

Ryan McGeehan
Starting Up Security

--

If you ask a CEO or board member about information security risk, they’ll succinctly respond with a broad expression of risk.

We don’t want to lose our data.

Their phrases will almost always focus on these extremely broad strokes. Examples would include “We don’t want downtime” or “We can’t lose customer trust”.

How can we use this same language to manage and organize risk?

Here’s an interesting modification: Make this phrase past tense. Now you have a really bland tabletop scenario:

We lost our data.

This tiny change should cause you to shudder a bit. This is a very broad tabletop, but it begs for improvement. The writer in me wants to add color to it and make it a more specific tabletop resulting in data loss.

SQL injection is allowing an adversary to remotely leak 100 rows of user data every two minutes.

Here’s an observation I can make with this change: Maybe a single person or small handful of tasks within an application security team would specifically defend against SQLi.

Let’s change it further to be more inclusive of the team. Let’s represent the work of more individuals in a broader application security program, while still inheriting the risks expressed by our previous scenario:

An adversary inputs malicious data into our applications with intent to exploit a vulnerability known to them.

This could include XSS, SQLi, issues around “Insecure Direct Object Reference”, etc. This does a real great job identifying a core work area that an entire application security team can relate to. A small handful of well carved out tabletop scenarios can define what a team is responsible for.

You could even define these nearby a mission statement to bolster a team’s purpose, in all of the areas the team cares about.

How do we modify this the wrong way? How about:

We don’t prevent SQL injection.

This doesn’t communicate risk as well. It’s a weaker expression of risk because it expresses the lack of an individual control. It loses the context of the larger scenario and why we wanted to prevent it to begin with.

Tabletop scenarios fall into a hierarchy of risk.

An engineer responsible for database infrastructure may look at the previous scenario a different way: The shortest path to a data leak is through their own credentials. Let’s tabletop this:

An adversary can execute database queries remotely from an SRE’s laptop.

This is a non-application security scenario that relates more to the endpoint, authentication, and production infrastructure access, but they both tie back to the core “we lose data” scenario very well. It calibrates both teams, working on completely separate subject matter, on a specific risk.

Now we have a common ground of risk between disparate teams.

Could adding multifactor to SSH reduce the risk of a data breach better than eliminating SQLi potential? You can begin to ask questions like this from a scenario based vantage point.

You may also be thinking that “I wouldn’t apply the same restrictions to every laptop, though!”. And that’s true, because we could probably create a small set of scenarios that represent our overall endpoint fears, like:

  • An adversary has stolen a laptop.
  • An adversary is remotely administering a laptop.
  • An adversary is delivering exploit code to a laptop.

I think a reasonable amount of well crafted scenarios can form clear relationships with their more broad “parent scenarios”. We can still have specific scenarios for more outlier risks, like a group of employees with elevated access and the larger set of controls that would need to apply to them.

While fanciful, fictional scenarios can provide fruitful tabletop discussion, a reasonable amount of well articulated scenarios can act as a directional reference for an entire security program focused on risk.

Tabletop scenarios help identify areas of unknown risk.

Let’s dive into the third example:

An adversary is delivering exploit code to a laptop.

This scenario very intentionally leaves out the “how”. It decouples the method of exploit from the scenario, and provides direction for a team to insulate against any delivery of exploit to the machine.

Let’s take a really uncommon exploitation method for example: What if some obscure WiFi exploit detonates malicious code on the victim’s host? This scenario applies nicely into the example because we did not specify the method of exploitation.

Had the scenario mention email delivery of malware (and if it were the only scenario) it would detract focus from other risks. With a well crafted scenario to include unknown risks into our efforts, generalized endpoint hardening becomes much more attractive than a dictated “solve the email attachment problem”.

Then, additionally, if you do have a loud spear phishing problem (for instance, you’re a frequent target of APT-28) you can identify the known risk (malware via spearphish) in a scenario right next to a broad scenario that assumes unknown risks. This would identify an above and beyond effort to address a specific risk.

Tabletop scenarios can clearly define how multiple teams address similar risks.

At Facebook, we had a “Site Integrity” team that was focused on broad mitigation of spam. For instance:

An attacker with millions of passwords rapidly authenticates to immediately deliver spam.

However, change this scenario in a subtle way:

An attacker bypasses authentication by exploiting a weakness in oauth.

An entirely different team called “Security Engineering” would get involved. That team was / is focused on code acting predictably and intentionally. The previous example is dealing with the result of successful authentication and known risks around password weakness. These teams work closely with one another, but the experienced members know how triage differs between the two.

This has a problem, though, when you’re explaining your team’s function to others.

Short brand names like “AppSec” or “Product Security” or “SRE” do not do very well conveying the risks they tackle. Well curated tabletop scenarios, do, however.

Most mission statements for security teams are about “keeping our business and users safe”. There’s lots of collision there, and taking responsibility for a specific tabletop can help quickly communicate what a team does, and does not, do. “How we do it” can be described with well phrased tabletops while maintaining a holistic (shudder), business centric mission.

Documenting a small list of example scenarios will help cleanly differentiate teams in a larger program.

You can measure risk quantitatively with tabletop scenarios.

The CISO at Twilio is the co-author of a book relating decision science to the measurement of risk in information security.

The claim of the book is as follows: An informed panel of well calibrated experts who are trained to subjectively assign probability to very specific risk related questions, can help you quantify risk when sparse historical data (or none at all) is available.

When you propose a scenario to a calibrated board of experts, like:

Our customer databases will be accessed via SQL injection within the next 30 days.

These calibrated (calibration is an important piece of this) experts can assign a probability based on their expertise and access to information on this risk, that is influenced by their trust in the entire constellation of surrounding controls that would prevent this scenario from occurring.

Expert panel: 2–5% likely this scenario will occur.

Probabilities will reach a lower percentage if there are greater, reasonable controls the panel trusts, but any rational panel would still leave opportunity for doubt. The estimator’s goal is to create a range of probability they can be 90% confident about, and this overall range improves with greater audit, control, and industry or historical data.

An aggressively pessimistic would state something like “50–80% chance SQL injection will occur in 30 days”.

This level of pessimism could be achieved by having an active threat actor and known, gaping, unfixed vulnerabilities discovered in audits, and perhaps recent, previous incidents. Or, it could be an area of infrastructure that has never been maintained, ever, and the calibrated experts are well informed of these failures.

Calibration training would enforce that these experts truly understand what 90% confidence truly means, as most people are wildly overconfident.

A couple sections above, I mention that scenarios have natural relationships to their broader risks, like the CEO’s favorite go-to risk of “we don’t want to lose our data”. The most ambitious part of this method, as proposed, is that it can reveal clear relationships between mitigation value across many scenarios.

For instance, if you spend a lot of time on endpoint scenarios, will you suddenly observe a greater confidence in your database scenarios?

You probably would, even without identifying that relationship of risk to controls anywhere.

This method of articulating risk alongside this method of measurement could reveal surprising facts about the efforts we apply and their systemwide impact within a company.

I most certainly won’t do this book justice with my TL;DR. I highly recommend reading it.

Scenario based risks may help define requirements, instead of checkbox compliance implementations.

A common teaching in requirement engineering is to keep implementation outside of requirements. For instance, “The Microwave must beep” is a bad requirement, while “The Microwave must notify the operator” does not impose an implementation or limit solutions, and allows the designer to innovate beyond a required “beep”.

We should consider if imposing specific mitigations would poorly address risk within regulation or compliance requirements. An example of this would be the requirement of Anti Virus in the PCI standard.

What scenarios were expected to be mitigated by Anti Virus? Shouldn’t Anti Virus be chosen to mitigate a risk requirement, instead of being the requirement itself?

In this case, the risk centric “requirement” could be a tabletop scenario, instead of pushing a specific mitigation.

Malicious code has executed on an employee’s workstation.

Could we build compliance frameworks that require solutions for specific risks, instead of requiring specific solutions? Could we identify tabletop scenarios that are applicable to every payment processor?

Maybe we can express compliance in terms of risk instead of solutions.

Tabletop scenarios communicate results from multiple approaches to risk assessment, and the sequencing of a kill chain.

The NIST guide to conducting risk assessments offers three “analysis approaches”.

Threat Oriented: Where you would enumerate the “Evil Maid’s” or the “Insider threats” or “Spammer with XSS”, and consider the bad things they’d potentially be able to do given their evil character, and the impact they’d cause.

Impact Oriented: Where you would enumerate situations or highest value assets that would cause significant losses, like a fire in the warehouse or a database breach.

Vulnerability Oriented: Where you would understand what is at a greater chance of damage or loss due to vulnerabilities that would exacerbate or create windows of risk, and discovering impact behind these vulnerabilities.

These are great vantage points to explore risk, but there’s a narrow quality to it when you have identified the resulting findings into large groups.

I feel that the resulting mitigation work will be organized with consideration of the “kill chain”. This perspective helps consider opportunities to disrupt an attacker up or downstream from the targeted risk itself, and we can combine it with our previous findings to create more scenario driven risks.

Let’s take spearphishing with malware payloads for example:

A remote adversary delivers malware to our employees with intent of accessing our data warehouse.

This expresses concepts of threat (Any remote individual using methods like a waterhole, a spearphish, local wifi attacks), impact (they want to steal from our database), and vulnerability (Does the employee need that access? Do they MFA? Would the malware be contained automatically?), and also touches on the sequence of events necessary to compromise the objective, which respects the kill chain as a way to model the risk.

Let’s look at this another way. Let’s be the guy or gal that is screaming: “Hey everyone, we need to have multifactor on everything”.

I think it goes without saying, but the latter method simply articulates a best practice. This feels like an approach to compliance, not risk, and results in poor communication and a disconnect to what we are trying to avoid.

Instead, an argument on top of the tabletop scenario is a powerful communication tool which encapsulates many perspectives on risk simultaneously.

Articulating your needs into a tabletop scenario can be much more powerful than rallying for a best practice.

Take a chance at tracking some scenario driven risks.

I am excited about this approach to risk, because it allows for untrained participants to contribute. A startup could start tracking risks with this method pretty efficiently

First, buy some pizza and reserve a conference room.

Host a brainstorming session where a group of people propose their nastiest tabletop scenarios. Don’t hold back or debate them one by one, just make a list and be inclusive of all of the ideas.

Be sure to encourage the high level ones that a board member or a lawyer would consider, like “we lose data!” along with the specific ones. Include these leadership roles in the process if you’d like.

If brainstorming doesn’t happen effectively, you can make small iterations from these high level scenarios, downward, into scenarios that could likely result in the higher level scenario. For instance, starting with “We lose data” and asking “How?” and enumerating the scenarios that would.

After the brainstorming exhausts, find the commonalities and have a smaller group merge them into meaningful scenarios that represent these risks broadly.

Have the team make a blind vote on these scenarios. Which ones are the most impactful to directly influence risk?

You should see some patterns emerge, and you may see disparity from the efforts you’re currently undertaking versus the risk you’re trying to mitigate. The big question:

Are you working on anything that impacts these risks?

On an ongoing basis, as new responsibilities or risks are discovered, you can create a small set of core tabletop scenario that expresses the new loss we are trying to prevent, and add them to this sorting mechanism you’ve created with the team.

Conclusion

Tabletop scenarios are an efficient means of communicating risk. They can help clarify requirements, unify disparate teams on specific risk, and provide a citable inspiration for the work we do and the controls we build. They have potential for use as a tool in managing a large amount of teams mitigating similar risks.

Most of my articles have pretty strong opinions on certain ways to approach specific risks, but this one is a bit speculative on an approach to risk and I’m hoping for opinions or construction on top of it. I want “risk” to become the mainstream language of security, not lists of best practices.

In writing Bad Things Daily scenarios, I’ve discovered that good scenario writing takes some skill and effort, but also has more purposes and value than just the resulting tabletop discussion. Each scenario feels like it has a very tangible, primitive declaration of risk underneath it. Communication about mitigations can be really powerful and effective when you start discussing risk from the primitive scenario.

This has recently made me very interested in this approach to articulating risk and I want to develop it further. I think there is an ambitious framework lying underneath it that would allow for many security teams to approach risk in a similar, portable way.

@magoo

I write about security stuff on medium.

--

--