On Attribution: Avoiding “We Got ‘Em”

Consensus and quantitative rigor for complex intelligence efforts.

Ryan McGeehan
Starting Up Security
8 min readMar 7, 2018


Most security incidents lack an adversary, or the adversary is not worth the cost and effort to pursue. In the remaining cases, attribution and enforcement becomes a complicated subject and these complexities create a minefield of cognitive challenges.

How can you be sure that your investigation was thorough? Does it reflect the biases, error, or general misunderstanding of an individual leader or investigator? Will your attribution fail publicly? Have you attributed an innocent person?

To add rigor to our attribution effort, we’ll use a technique based on probabilistic estimates from a panel of forecasters. The simplified process described here is inspired by the National Intelligence Estimate with the addition of a quantitative forecast.

Our goal is to reduce bias, error, and uncertainty in an attribution effort.

This is not an attribution tutorial. This will not prescribe your unmasking techniques. Instead, these are suggestions that may protect a leadership decision and hopefully reduce the risks from an overconfident attribution.

What does group estimation look like?

A dramatized example of this process is in the film “Zero Dark Thirty”:

The CIA discussing their uncertainty levels based on known evidence.

According to FOIA’d documents, this reflects actual events. After a CIA analyst expressed a 95% confidence of UBL’s presence in a building in Abbottabad (a warning sign of overconfidence), multiple intelligence agencies offered “red teams” to review and provide independent analysis. They each expressed 40 percent, 60 percent, and 80 percent estimates. This was done to eliminate errors that could result in a massive intelligence failure.

This decision making material was then brought to President Obama, who found the risk acceptable and succeeded in finding UBL with a rigorously developed probability of about 60% success (the average of the previous estimates).

This measure of confidence may have also guided investment in the effort (a small, surgical team instead of a missle).

Forecast panels are an interesting tool when dealing with problematic data in many types of risk, but especially interesting when considering the benefits to the attribution space. Attribution involves the risk of an adversary that actively tries to subvert investigation. This requires incredible scrutiny and assessment for any complicating tradecraft that would have us miss the mark.

How do you do it?

A good estimate begins with a clear and measurable scenario. I write more about scenario building in another Medium. This book by Robert M. Clark spends more time than most writings on the robust development of a good scenario for intelligence analysis. A scenario should best represent a risk or intelligence area that needs measurement. You would tweak a scenario as needed just like how you would have a group agree on a problem.


“Mallory from Smalltown, USA, is guilty for incident X.”

Next, participants with diverse perspectives and skillsets related to the scenario are gathered. They are briefly trained to be probabilistically minded, which requires a small amount of effort to maximize System II thinking to identify and reduce bias. Panelists are ideally given access to all available information related to an attribution effort and asked to provide a probabilistic measurement of their uncertainty related to the scenario.

For instance, a 10% forecast would mean that a scenario with similar evidence would only see that outcome one out of ten times.

50% is total uncertainty. We’d replace your brain with a coin flip.

100% certainty means you’d be comfortable with betting the lives of yourself and everyone around you, while being rewarded nothing. Extremes in certainty are often cause for suspicion and indicative of bad probabilistic thinking. There’s usually some room for uncertainty.

Here’s an example of a panel’s results after reviewing an investigation that has the goal of attributing an incident involving a computer intrusion:

General Counsel: “The subscriber information found during discovery is related to the exfiltration domain that matches our other subpoenas. The legal panel has about 65% certainty this is the suspect.”

Technical Analyst: “The subscriber information is not that valuable to me since identities are vulnerable to theft. The fact that we found a reused certificate on Mallory’s public infrastructure that was used to exfiltrate data is what does it for us, in addition to the other information stated. The analysts arrived at 70% confidence.”

Outside Counsel: “When we delivered the cease and desist, Mallory said nothing, slammed the door, shouted a profanity, and their lawyer reached out to discuss the C&D the same day. That, alone, is almost always a good sign that we got the right suspect. The partners at the firm are at 80%, even without seeing your case”.

OS/INT Investigator: “Mallory has a lot of friends, so I’m not certain she is the primary actor here, but the evidence has some weight to it. I’m 50/50, 50%

Example Average: 66.25%.

With this estimate, a chief decision maker can then decide to confront their suspect. You can imagine opposite scenarios where these numbers are more widely distributed, or with a lower average, and a decision maker will avoid any form of confrontation with so much disagreement. They’ll instead begin resolving conflicting information or end the investigation altogether.

“Probable” is dead. Long live probability!

The quantitative expression of probability is really important, which is why I’ve strongly suggested it in addition to analogies from the NIE process. Research suggests that when you ask participants to qualitatively express their certainty, you get bad qualitative certainty in return. Answers like “Kinda” “Maybe” and “Probably” introduce significant noise into analysis findings.

Findings by Sherman Kent and others discovered that among people, the word “probable” varied as drastically as 70% in its probabilistic definition.

Some people believed a 20% likelihood was enough to be “probable”, whereas others believed a 90% was enough.

A study on qualitative expression of uncertainty. Words like “Probable” mean a wide array of actual probabilities.

This variance exists for several words heavily used in expressing probability.

How often do you say “probably”? What do you think it means?

These findings are reproducible, as was recently recreated on reddit.

For these reasons, it seems reasonable to discuss probability in odds, instead of words. While the Intelligence Community hasn’t adopted a quantitative approach to National Intelligence Estimates, they openly respect these findings.

Policies, budgets, and other decision making.

A target certainty can be placed by leadership demanding a panel-based estimate of X% or higher confidence before even discussing civil action or before making a proactive referral to law enforcement. Unless all participants hit that mark, an executive can ask not to be bothered by excited, overconfident individuals who risk taking them down a biased road.

An estimate can change with repeated forecasts over time, and can be measured incrementally as evidence unfolds. The value of evidence, or value of specific investigative steps, can be tracked quantitatively as it sways a group’s level of uncertainty.

This hypothetical analysis shows a large value on the Forensic Report (increasing group confidence by 45%), and also information gathered from subpoenas (15.5%).

Finally, a group of individuals can suggest actions that would further increase their estimate, conditional on the outcomes of those actions.

For instance:

“If we can serve this ISP with a subpoena, and if their records match our current suspect’s name… I would probably increase my forecast by 20%. If it didn’t match, I’d fall by 5%”.

Now you have a quantitative foundation for decision making, since you can price the legal costs of pursuing a subpoena with the value of certainty it would provide. You have an estimation tool that can help decide if costs of investigation are still worth pursuing. For instance, the outcome of a John Doe subpoena may have costs expected to be around $20k. If the outcomes are expected to only increase your forecast by a maximum of 5%, it may, or may not be worth the $20k to your organization.

Estimates may enable privacy of an investigation and reduce data sharing.

Let’s say that multiple teams are investigating intrusions that look similar.

They suspect that a single actor, or actors, are involved. All parties have an interest in stopping the behavior, but may have boundaries that prevent data sharing with one another despite their shared victimization.

A couple of forecasting approaches might help minimize that risk.

Craft a scenario that enables a valuable discussion to take place. Consider a scenario without naming a specific actor:

We have identified suspects responsible for (incident x).

If multiple teams have have a substantive amount of certainty, this may be a great icebreaker to share information to close gaps and identify a bad actor. From there, effort can be made to craft NDA’s or contracts that protect the sharing of information. If the initial estimate comes in weak, it is less likely a collaboration would result in something fruitful, and you chose to avoid the collaboration altogether.

Alternatively, these teams can avoid sharing altogether and agree on a joint enforcement effort.

If multiple organizations arrive at a higher certainty (maybe fifty/fifty or higher) on attribution, they can agree on an orchestrated engagement with law enforcement while not sharing investigations which may reveal quantifiable losses or expose intellectual property with one another. Competitors could then simultaneously escalate their confrontation against a mutual threat, without sharing sensitive information.

This can help avoid data sharing in a complex investigation that would have ultimately been low leverage and would only increase the risk of sensitive information leaving the barn. When that leverage becomes clear, then the risks of data sharing can be undertaken. This helps a team focus on its highest likelihood attribution efforts.

A probabilistic panel won’t solve everything.

Here are some issues to look out for while introducing these techniques:

It’s always desirable to introduce a healthy amount of scrutiny into a big decision. But, if trading against mission speed, you may see other risks. For instance, some blame a rushed timetable for the National Intelligence Estimate regarding Iraq WMD’s.

Requiring a lengthy consensus process may introduce cultural risk as well, so forcing a consensus process upon every decision may curse your organization. Consensus is great, but we don’t need a committee for every decision.

A panel of individuals will not guarantee protection against bias. Groupthink is largely associated with the Challenger explosion.

Additionally, incentives or performance management on ever-increasing-probabilities is a sure way to make estimation change into a toxic process. If a bonus or performance review is reliant on numbers changing, a panel will be sure to corrupt them over time in pursuit of money. Never let a boss say “don’t come back until you can say 90%!”, or they’ll get what you wish for.

Lastly, if individuals are not trained to be aware of their own cognitive errors, you may end up with experts who are worse than laymen at forecasting.


Forecasts are a flexible tool to estimate risks and the validity of intelligence. Including consensus and quantitative rigor in an intelligence effort can help avoid big mistakes, especially in attribution.

Ryan McGeehan writes about security on medium.