Measuring “In the Wild” Exploitation
Exploring our security expectations across browsers.
This talk was given at Art into Science in January 2019
What would you say if asked:
“Which is more secure: Chrome, or Firefox?”
Now, think about how you would answer that question. You might think of some of the following factors:
- The differences in their available security features.
- A comparison of vulnerability track records.
- Your understanding of each organization’s security investment.
- What else? 🤔
You might then come to a conclusion that one browser is more secure than the other.
But… by how much? The evidence we considered was probably subjective.
One could determine “Browser X is better than Browser Y”, but this does not describe how much better it is. The whole discussion is increasingly slippery because we haven’t even discussed what “security” entails. Let’s fix all of this.
I’ll take you through a forecast of a single aspect of these browsers: The likelihood of an “in the wild” exploit being seen in a given month, using some tools described by Simple Risk Measurement.
First step: Round up the gang!
I gathered a group of diverse individuals with varying skill sets to reduce bias and avoid the dangers of an individual expert opinion. They included 15 security engineers, managers, directors, CISOs, journalists, and researchers. They are Male and Female, varying in career duration, education, and varying calibration training.
Note on panels: Consider what is useful. You might find small groups useful too!
We worked in a couple of
#forecasting slack channels, email, and in person conversations.
Let’s nail down exactly what we want to compare.
I asked the panel to approximate the probability of a future (the next month) undesirable Chromium scenario.
Will a “Critical” Chromium exploit be discovered “in the wild” in September 2018?
I would act as “the judge” of this scenario once September ended. It would result in a
No if I couldn’t discover any confirmation that this scenario took place with a hour of effort. It would result in a
Yes if I find evidence of an in the wild exploit of Chromium in September 2018.
Let’s stabilize our qualitative phrases.
The qualitative term “critical” is well defined by the Chromium team. Additionally, there’s a track record of these bugs in Monorail. For judgement of
critical, we are trusting the Chromium team’s judgement.
Next, “in the wild” is a far more subjective phrase. It’s not a standardized term. Ask yourself what it means, and you’ll find a different answer from everyone else.
A discussion with the panel, and others, let us to this temporary legalese we could use temporarily for the forecast. This led a more reliable forecasting target.
Question: How many critical Chromium bugs have there been?
A prerequisite of this forecast relates to the frequency of critical bugs being found. Using this as a reference class helps narrow in on frequency of difficult estimations.
At the time of this forecast, there were 126 Critical bugs ever found in Chromium code in ~100 months of its existence. There were ~ 10 bug bounty payouts per month, and less than 2 “critical” CVE’s found monthly that the panel could find as well.
So, Google regularly sees this severity of bug discovered.
But, bug finding does not result in “in the wild” exploitation! It just helps us with the approximation process.
Question: How many of these saw “in the wild” Chromium exploitation?
The panel could not remember any instances of
critical Chromium bugs being exploited in the wild, ever.
So, we asked Chris Evans (Founder of the Chrome Security Team) if he shared this observation during his time at Google.
He had this to say:
There weren’t any “in the wild” full Chrome critical exploits seen while I was at Google[…] I doubt there have been any since, because the team is very transparent and it would be a big deal[…] No one sane really doubts that the top-end governments have critical exploit capability against Chrome.
This was informative and supported the panel’s opinion that this is a rare event, historically. It suggests a black swan. It’s falsifiable: But the belief is that it is rare… inviting debate of “how rare would it be?”.
Additionally, we built an “odds table” to help approximate our evidence.
Panelists use this to assist in quantifying their beliefs, similar to how it has influenced modern intelligence analysis. There are several ways to build out an odds table like this, resulting in with different values. This approach used a simple Monte Carlo method that has a similar output to a combinatorial approach.
With this, panelists were able to quantify their beliefs as a forecast.
I collected the panel’s opinions with some open source software being developed to manage the collection of subjective approximations.
The panel’s estimate…
The panel suggested a
1.65% belief that they would see an in the wild exploit in Chrome the next month. With an odds table, that translates to:
“Expect an in the wild Chromium exploit every 4–6 years.”
One area of panel disagreement surrounded political climate in October of 2018.
Some panelists felt that midterm elections, the Brexit, and increased military activity would increase that specific month’s likelihood of this scenario taking place by “shaking out” an exploit in a weapons stockpile.
Other panelists strictly disagreed with this, believing political factors to be a regular factor month to month, and shouldn’t artificially increase those odds until more extreme circumstances were seen.
However: Panelists do not need to agree in a method like this, and they didn’t. However, deviation was not high, and all panelists believed at least a “once in years” frequency.
After September expired, we could look back and score our prediction with a Brier Score, and the score of
0.00027 was applied.
How does this compare to Firefox? 🦊
We decided on a similar scenario, and re-used other aspects of the Chromium prediction:
Will a “Critical” Firefox exploit be discovered “in the wild” in January 2019?
The panel’s result was a
5.58% belief of this event occurring in January. This can be translated to:
“Expect one in the wild exploit every other year.”
Firefox is very different from Chrome, notably in track record. It has had several
critical bugs exploited in the wild. In 2015 and 2016, notably, not to mention many others across the decade. So unlike Chrome, we are not in “black swan” territory.
In addition to this, the panel discussed the recency and development of the Firefox Sandbox. Chrome has 10 years of history with their sandbox model, and Firefox is still making progress on theirs.
This led to greater uncertainty, more outliers, and less confidence about future exploitation of the bug. But, it still had a “once in years” estimate.
This forecast has also concluded (
No), with a Brier score of
These forecasts are similar, but different.
There are several aspects of this process that can attract well founded criticisms:
- The panels were different, and differently sized.
- The “critical” labeling is not exact for each browser, only similar.
- The forecasted months were different.
- Subjective language, even if well defined, is still qualitative.
- Calibration cannot be found with a small set of forecasts.
There’s likely more areas where this gets slippery, but usefulness is the measure we’re looking for. I felt this was useful.
How do I read this data?
People will likely make different decisions as a result of this data, or, none at all. Firefox may be “good enough” for you punt on a decision. Perhaps someone would want to switch immediately. Or, it could help justify a migration to either browser if you’re stuck in a legacy environment. Who knows!
Only you know the decisions you need to make and the data you need to make them with.
Ultimately, this is just a measurement method that is applicable when data is scarce and uncertain. Measurement leads to management, and this might help.
Ultimately, I personally have some nuanced opinions about Chrome and Firefox, but I would consider them to be useful and secure enough to use day to day. I’d be OK using or working with either. I was happy to draw a similar opinion from the panel, and I interpreted it as both being pretty good pieces of software without extreme risk.