Risky Business

Chris Marshall
ChrisMarshallNY
Published in
8 min readDec 4, 2019
Image: Pearson Scott Foresman/Wikimedia Commons/Tanyadzu/Robert Adrian Hillman/Shutterstock

“A clever person solves a problem. A wise person avoids it.”
-Albert Einstein

META: This is a reprint from my personal Web Site.

Life Is Risky. Learn to Live With It.

I can’t get anywhere, if I don’t take risks, but I won’t get anywhere if I’m reckless.

I have found that it’s…risky for me to evaluate risk without some kind of mental framework.

In this essay, I’ll give a brief overview of my approach to managing risk.

I‘m Terrible at Evaluating Risk

When it comes to evaluating risk, the human brain is a remarkably flawed instrument.

Many studies have been made as to how we perceive risk, and how it affects decision-making ([01] — [12], below).

Here’s a rather entertaining take on it. [13]

TL;DR. We suck at risk. We need help.

Below is a pithy, off-the-cuff, not-scientific mental exercise that I use to reduce the bias in thinking about risk, and evaluating ways to deal with it.

This is NOT a full-scale risk analysis exercise! In fact, it’s likely to make actuaries fall to the floor, foaming at the mouth.

The Three Factors of Risk Management

When I look at risk management, I analyze the risk, then plan how to approach the risk by developing approaches for Prevention, Mitigation and Remedy.

Prevention

Prevention is taking steps to reduce the possibility of the risk manifesting.

Image: Design Tech Art/Shutterstock

For example, if I’m going to be using a bucket to hold some liquid, and have some concerns about that liquid spilling and causing damage, a possible prevention strategy could be to use a large bucket, and not fill it too much; thus reducing the likelihood of the liquid splashing over the walls, and also giving a lower center of gravity.

I might also place the bucket somewhere where it won’t get kicked over.

Mitigation

Mitigation is taking steps to reduce the consequences of the risk manifesting.

In the above bucket of liquid example, a possible mitigation strategy would be to place the bucket on a surface unlikely to be damaged by the liquid, if it should spill.

Image: Design Tech Art/Skalapendra/Shutterstock

Remedy

Remedy is taking steps to deal with the fallout, if a risk should manifest.

In the above example, having a mop and bucket nearby could be a remedy.

Image: FARBAI/Shutterstock

Scoring the Risk

Before I get to managing the risk, however, I need to measure it. Some risks aren’t worth worrying about, while others could be existential threats.

The Two Axes of Risk Measurement

When I measure a risk, I have two axes that I use to “score” the risk: Probability, and Severity.

Probability

Probability is the likelihood of a risk manifesting.

In the above example, having the bucket out in a high-traffic area means that the likelihood of a spill is higher than if it were placed in an out-of-the-way corner.

I address probability by working on prevention.

Severity

Severity is a measure of the consequences of a risk manifesting.

In the above example, if the bucket is filled with distilled water, and it spills, it’s likely to do a lot less damage than if it were filled with ClF3 [14].

Image: Unknown/Public Domain Vectors/Pearson Scott Foresman/Wikimedia Commons/Design Tech Art/Shutterstock/OpenIcons/Pixabay

I address severity by working on mitigation.

A Quick “Back of the Envelope” Method

That’s all fine and good, but what do buckets have to do with software? Where are the numbers? If geeks can’t express ourselves in numbers, we just go all to pieces…

Prepare to go to pieces, then. I don’t use numbers.

This isn’t actuarial science. A lot of my programming methodology comes from experience and “gut instinct.” If I can avoid going down a rabbit-hole, I will.

This is a quick way to “score” a risk, without getting too “science-y.”

Chart 1

In the above chart, I look at each risk with very broad “gut” assumptions; examining the risk empirically.

I’ve found that if I split the risk into its two components, and evaluate each one separately, it helps me to be a great deal more objective. The important part of this exercise is divorcing probability from severity. If you look at the list of cognitive fallacies below ([01]–[12]), you will see that, when we perceive risk, we tend to ignore one axis for the other (either thinking a very severe risk could manifest more frequently than reality shows, or a milder risk will not manifest as frequently as it actually does).

I evaluate for probability first. How likely is it that risk will manifest? For example, if I’m designing a device driver that uses wireless connectivity, and the risk I’m examining, is that commands will fail to reach the device, then how likely is it that the commands will fail to reach their destination?

In a lab, this may be “Extremely Unlikely.” However, in the field, the probability may rise to “Very Likely,” or even “Extremely Likely.” This may be dependent upon things like distance of the device from the controller, or the presence of a lot of metal.

Next, I evaluate for severity. What are the consequences of the risk manifesting? If the device is a coffeemaker, then failed commands may result in nothing more than cold coffee. However, if it’s a robot assembly controller, the consequences could be an industrial accident; with all the fun stuff therein.

In Chart 1, the X-axis is severity, and the Y-axis is probability.

“E.L.E.” stands for “Extinction-Level Event.”

When scoring a risk, I use the two axes to determine where the risk lands.

If it is red, then I shouldn’t take the risk, unless I can come up with a very good remedy.

If it is yellow or orange, then I should definitely consider prevention, mitigation, and remedy. I should not ignore the risk.

If green, then, if it’s not too much trouble, I can do some work, but the world won’t end if I don’t do it.

White means don’t bother. Black means don’t take the risk.

If I use prevention and mitigation (remedy is for after a risk manifests), I may be able to move the risk into “greener pastures,” so to speak.

An Example

In Chart 2, let’s say that I have my device driver, and it controls a robot arm in a factory.

Probability

Because of the presence of metal, and the fact that the person that controls the robot has a duty station that requires them to walk around the floor, the probability of interference with commands is “Extremely Likely.” Additionally, the robot’s programming was designed for constant human monitoring (for safety), so the user needs to interact with the robot frequently. This further adds to the likelihood of communication failure.

Severity

Because this robot is one that moves large pieces of fairly raw steel in an environment of mixed robots and humans, the severity of a communication failure could be fairly high. I call the severity “This Is Gonna Really Hurt.”

Chart 2

Ugh

That lands us in the upper red zone. Not good.

What to do?

Reduce the Probability

The first thing that I’d do, is find a way to reduce the probability of a problem. This could be done in a few ways:

  1. I could implement a policy that requires the user to be within 10 feet of the machine every half hour (or however long the requirement is between checks).
  2. I could replace a lot of the metal around the machine with composites or wood.
  3. I could design a subsystem of the robot program that allows more autonomy, lessening the requirement for human intervention.

Let’s say that if I implemented each of these suggestions, I could reduce the probability all the way down to “Quite Unlikely.”

However, the last two are extremely expensive, so I might decide to try to get away with just #1 (user gets up close and personal with the machine). Let’s say that reduces the likelihood to “Very Likely.” Not great, but better than “Extremely Likely.”

Chart 3

Reduce the Severity

The next thing that I examine is severity. Right now, it’s pretty bad. Let’s see if I can come up with some ideas to reduce the severity:

  1. We could put up shielding around the device, so if anything flies off, it won’t get far. This would need to be high-impact plastic, to avoid radio interference.
  2. We could remove the human workers from the area, relying only on robots.
  3. We could design a subsystem of the robot program that immediately stops everything if there is even the slightest deviation from the plan.

If we implemented all of these, I could probably reduce the severity to “Very Little Impact,” but, as above, they would come with costs. Let’s see if I can get down to “Ouch.”

#1 sounds like the best idea, but it has a couple of issues:

Even plastic will cause some interference. Since I’m not doing anything about making the radio stronger (like replacing all the metal with other materials), then that could cause issues.

Also, even if nothing else gets damaged, the damage to the robot, itself, would be severe, as it would now be enclosed in a shell that could reflect shrapnel back into the device.

#2 would be extremely expensive, and could cost people their jobs.

Looks like I have to bite the bullet, and work on an upgrade to the software. This won’t be a simple project, but it is the only one that fits within my budget. If I do this, no one needs to lose their job, and the machine is less likely to suffer damage.

Chart 4

If I do both of these, we could move the risk into yellow territory. Still not wonderful, but I’ll take what I can.

Chart 5

The above shows a fairly concrete example of how I use this rather “naive” methodology to evaluate and reduce risk.

--

--

Chris Marshall
ChrisMarshallNY

Software engineer and developer of Things Apple, living in New York.