Disclaimer: This post represents personal opinions and thoughts, and does not represent the views or positions of my employer, Google.
This post is a continuation in an on-going series of posts examining the ongoing and upcoming SHA-1 deprecation; the first post in this series is A History of Hard Choices.
While the previous post explored the historical context in which the SHA-1 deprecation fits, and in the many failures to respond adequately to known risks, it didn’t really address the actual Legacy Verified proposal made by CloudFlare and Facebook, and subsequently endorsed by Twitter, nor how it attempts to mitigate the concerns with continuing SHA-1 allocation.
While the LV proposal exists as a set of bolded amendments to the existing Baseline Requirements, the core requirements of the proposal can be summarized as follows:
- SHA-1 certificates may only be issued if they are LV.
- LV certificates must have a distinct policy identifier.
- All SHA-1 certificates issued under the LV scheme must expire on or before 31 March 2019.
- All LV certificates must include, within the subject, an organizationName, localityName, stateOrProvinceName (as appropriate), and countryName.
- All LV certificates must be issued from an intermediate dedicated for the purposes of issuing LV certificates.
- In order to obtain an LV certificate, the applicant must also obtain a SHA-2 certificate and agree to make reasonable efforts to serve it to clients who can handle it.
- All LV certificates must stop if a means to make modern clients accept an LV certificate is found, and for which neither the Application Software Supplier nor Subscriber can make changes to mitigate.
According to CloudFlare, Facebook, and Twitter, these requirements represent a reasonable set of both technical and procedural controls which, when combined, mitigate the risks of SHA-1 issuance. Unfortunately, in proposing these requirements, it reveals either a lack of knowledge of the technology involved or, worse, a lack of concern for the billions of users and applications which would be needlessly put at risk by Legacy Verified certificates.
The Threat Model
Before exploring these controls in depth, it is worth recapping what the overall threat of SHA-1 issuance is. A certificate is comprised of a series of fields which always appear in a particular order: roughly, Version, Serial, Algorithm, Issuer, Validity, Subject, Subject Key, Extensions. The exact contents of the certificate’s extensions varies, and has no particular order, but all other fields must appear in exactly the same order in every certificate.
An attacker that wishes to mount a chosen-prefix attack first “chooses” (in this case, predicts) what the beginning of a good certificate will be. They also determine what they want the evil certificate to say. Once the beginning is predicted, which is shared between both certificates, the attacker computes a series of collision bits. These collision bits are included in the good certificate as the attacker controlled data — that is, data that the CA copies over unconditionally, such as the Subject or Subject Key. Once included, the attacker is then free to include arbitrary data — effectively, controlling everything after the collision. When the attacker has finished making the certificate look as they want, they add enough data to get the contents of the evil certificate to be in the same state as the good certificate. Because both the evil and good certificate end up in the same final state, the attacker can simply take the signature from the good certificate — the one that to the CA and to the world looks entirely benign, and is entirely public data — and copy it to their evil certificate, causing the evil certificate to be treated as trusted.
Thus, when considering SHA-1 issuance, the defender must consider that everything after the first piece of attacker controlled data may be subsequently replaced in an ‘evil’ certificate. Similarly, the defender wishing to defeat the attack needs to do everything possible to prevent the attacker from being able to predict the prefix, since doing so gives them opportunity to compute the collision bits for the chosen-prefix.
Because the Version, Algorithm, and Issuer are fixed for every certificate a given CA issues, and because both the Subject and Subject Key contain or are entirely data supplied by the attacker, this leaves the defender with either changing how the Serial is computed, changing how the Subject is formed, or changing how the Validity period is calculated. These are all well-understood mitigations for the attack — indeed, this is what Microsoft’s Root Program adopted as a requirement in 2009, as previously discussed.
It’s also important to reiterate that everything subsequent to the attacker’s data may be manipulated and changed by the attacker. Most important among these is the X.509v3 Extensions, which will contain the domain name a given certificate is valid for (subjectAlternativeName), whether or not a certificate is allowed to issue other certificates (basicConstraints), and what usages the certificate is valid for (extendedKeyUsage). The most critical, and valuable, of these is whether or not the ‘evil’ certificate is allowed to issue other certificates — if the attacker can change this field, they can put whatever they want in any certificate they want, for any domain name they want, and have it be accepted as valid.
This is the threat model under which these certificates operate — simplified to its bare form, if the attacker can predict what comes before their data (nominally, the Subject Key, but technically, the Subject as well), then they can control what comes after their data.
It is within this threat model that the LV proposal operates, and proposes a series of controls that attempt to mitigate such risk.
The most obvious of these is the requirement that LV certificates get validated to at least the level of OV certificates, by requiring that all LV certificates have an organizationName. In order for a CA to include this field within the Subject, the Baseline Requirements stipulates certain requirements on both the CA and the site wishing to obtain the certificate, and LV leaves these unmodified. The presumed intent is to make it difficult to mount an attack like that performed on RapidSSL, by making the attacker go through this additional vetting before obtaining an LV certificate.
Unfortunately, this additional vetting is just show; it doesn’t prevent an attacker from mounting a RapidSSL-like attack. Despite this additional vetting, the Baseline Requirements only require this happen once within a thirty-nine month period, and only happens once per attacker, not once per certificate. The CA is allowed to reuse this information for subsequent certificates, and thus can fully automate the issuance of such certificates, leaving the attacker ample opportunity to repeatedly issue requests and try to determine and predict what the prefix will be.
Worse, much like SGC certificates after 2000, it creates a product that doesn’t add security value, but which can be sold at a significant premium. Given that CAs sell products to an undiscriminating market, there’s limited incentive for CAs to stop this practice, and ample incentive for a similar discussion about the at-risk users when the 2019 date approaches. Given the history of SGC, it seems unlikely to reasonably expect that CAs will be content with allowing a premium, though largely unnecessary, product to evaporate. Worse, that price premium encourages centralized services on a decentralized Internet — the only parties that can compete for those legacy users are those with the ability and infrastructure to meet the requirements LV sets out; which Facebook, CloudFlare, and Twitter certainly can, but for which most startups and small organizations simply could not. Despite open-sourcing one implementation of certificate switching, it’s not a generic solution that works for most web and application servers — certainly not the most popular ones.
Another example of a procedural control is the policy identifier — while this would seem like it would be a means of rejecting such certificates, because the attacker can control all the extensions, which is where this value appears, they can easily remove it, making it appear like a normal certificate. Its inclusion is not intended as a mitigation for attackers — rather, it’s necessary because the authors of the proposal are introducing a new type of certificate, and each type needs distinct identifiers.
The requirement that the certificate holder also have a SHA-2 certificate is a similar procedural control — that is, there’s nothing technically enforceable when encountering a SHA-1 LV certificate to know whether or not this is true and was followed. It attempts to set expectations as to who can use LV certificates, but it carries all the similar weight as Microsoft requiring that SHA-2 certificates be made available after 2011, which a large majority failed to do.
The final procedural control is the ‘suicide clause’ — that all LV must stop if a way to make a modern client accept it is discovered, and for which no browser or site changes can get around.
This is perhaps the most interesting of these, because depending on the point of view of the person making the argument, it can either be argued that no LV certificate can ever be issued (because such means exist already), or it may be argued that LV can never be stopped (because software is infinitely malleable, even though it may be cost prohibitive to do so).
The understanding of why this is the case is subtle, and relies on a little background about how certificate validation works — that is, how an application determines whether or not a certificate is acceptable for a given purpose.
Certificate Validation: A Detour
To understand how certificate validation works, it might be easy to think of it like driving directions, as they both share a common underlying structure: they are both directed, cyclic graph problems. The goal of certificate verification is to get from a point A — the certificate you’re trying to validate — to point Z — a certificate you trust. Similarly, if you’re driving from Los Angeles to New York, you have a starting point, Los Angeles, and a destination, New York.
The goal is to find a path that gets you where you’re going. Each stop along the way, there are rules that limit the next steps you take. For example, a certificate might not be allowed to issue other certificates, or a road might be closed for construction or be a one-way only street. Similarly, if someone asks you what the ‘best’ route is, whether it’s for a certificate or a road trip, the answer is “it depends.” Do you want the scenic route, or the fastest route? Do you want to avoid the toll roads or the freeways? There are many routes you can take, whether they be direct — the shortest possible — or indirect, such as taking a boat from Los Angeles to China, driving across Asia and Europe and down to Africa, then driving up through South America and all the way up to New York.
Similarly, when you leave Los Angeles, you could drive to Phoenix or you could drive to Las Vegas, and when you start with a certificate and look to find who issued it, there can be many possible results, because there can be many different versions of a given certificate, each with their own limitations and constraints. When you encounter a one-way sign on a road, it rarely tells you if there’s another road to get where you’re trying to go — after all, it’s just a road, there could be many destinations. Similarly, when you encounter a certificate with constraints on it, it doesn’t necessarily tell you how to get where you’re going — only that you can’t go this route.
As there are many routes from LA to New York, there are many certificate paths, all of which individually may be valid or invalid, in the same way that driving to Hawaii to get to New York isn’t really a viable option, even though driving through Nashville may be.
The rules for determining whether or not a path are valid are covered in RFC 5280, but that only takes a fully completed route and tells you whether or not it works. To figure out how to read a map, or to navigate certificate paths, you need a guide like that in RFC 4158, which explains how different routes can be formed, and how to determine whether or not a path is valid.
Now here’s where the metaphor breaks down a little; when validating certificates, it’s not as “simple” as asking if there’s a route from LA to New York. Instead, the question that’s asked is “Is there a route between Los Angeles and one of New York, San Diego, Lisbon, Shanghai, or Honolulu, and if so, what is it?” That is, is there a path from this certificate to ANY of the root certificate authorities the client trusts.
The APIs used for certificate validation, regardless of platform, rarely give you the opportunity to set your additional preferences — such as don’t drive on highways (avoid this algorithm) or don’t go through this city (avoid this particular CA). Instead, the best the application can do is, after getting the route, saying “Nope, I refuse to accept any route through Las Vegas, so I guess there’s no possible route.”
Worse still, many applications, when trying to get from Los Angeles to New York, can’t handle road closures or one-way streets: If they encounter them, they just decide there’s no possible route, despite the fact that they could back up and try a different side-street. This isn’t rare — it’s true for OpenSSL, it’s true for OS X, it was true for Firefox up until Firefox 37, and it is true for virtually every other implementation and language out there. For the longest time, the only deployed application that would consider backing up and trying a different route was Windows — but even then, it doesn’t let you say to avoid highways and toll roads.
This is not merely a matter of philosophical failure modes, but is writ large in compatibility issues with Chrome’s methods for detecting SHA-1 issuance, or in the compatability problems that Firefox saw when removing roots, proverbially closing some roads for construction.
So how does this little detour tie into the LV clause that they be suspended if a path is found that modern clients will accept, and that no software change will correct? Well, as mentioned, the way virtually every certificate validation library works is that you’re limited on what you can tell it — you can no more tell it to avoid Las Vegas than you can tell it to disable SHA-1. Worse still, if you disable SHA-1, these certificate validation libraries will act like they’ve encountered a one-way street — they’ll simply give up, even if there’s another route to the destination. So, in practice, it’s impossible to guarantee a modern client won’t trust an LV cert.
Which gets to the second part of the clause — whether or not it can be fixed with a change to software. Software is, ultimately, infinitely malleable — it does exactly what we tell it to, regardless of whether or not that is what we intend. So it is always possible to argue that a change can be made to fix this, just like it is always possible to argue that if you want to get to New York from Los Angeles, and avoid Las Vegas, you could construct a highway that follows a straight line, building bridges over every river, tunnels through every mountain, and buying every bit of land necessary to accomplish this. Is it economical? Absolutely not. Is it possible? Surely so.
As it stands, none of these policy controls are very effective — they’re largely there because they fit the pattern of how certificates are issued, serving no particular purpose other than making the proposal look more robust and detailed than it is, and hiding the scarcity and ineffectiveness of the technical controls.
The real heart of LV rests on its technical controls, which the proponents argue are sufficient for the risks posed.
The one easiest to show as inadequate is the validity period requirement. This does nothing to prevent the attacker from obtaining a collision, it just limits the harm that they can do to the Internet to being a little more than three years. The idea that it would be acceptable to leave billions of users at risk for three years is not at all viable. This requirement doesn’t try to prevent damage, just limit it, but the limit is so unreasonably high that it’s effectively unlimited. Even if the validity period was reduced to a day, if an attacker was able to successfully mount one attack, they could presumably repeat it indefinitely until detected; it may slow the attack, but it does not stop it.
The next technical control is in requiring a distinct intermediate to be used for LV certificates. Similar to the validity period, this doesn’t attempt to thwart the attack at all; rather, by requiring a distinct intermediate, it makes it easier for software to distrust that intermediate, should an evil cert be detected. It’s as if the only time you would ever drive through Las Vegas is to get to New York — if you want to stop people getting to New York from Los Angeles, you could close all roads in to and out of Las Vegas, and your “problem” would be solved. What the metaphor here hides is that this control relies on detecting the problem first, which is both difficult and unreliable, especially for the users in the war-torn and repressive regimes that Matthew Prince wrote about — or those affected by targeted interception attacks revealed by nation-state adversaries like the NSA and GCHQ.
This leaves the only technically effective control being the requirement of entropy in the serial number. The twenty bits that LV proposes has no academic or technical background to its selection; it was merely inherited from the existing Baseline Requirements, which itself was the result of unfortunate compromise necessary to appease CA members of the CA/Browser Forum. It’s not known whether or not twenty bits is enough, and that’s largely a question of the work-factor of the attacks, in that it is assumed an attacker would need to mount 2²⁰ parallel attacks to guarantee success.
Even if twenty bits is acceptable, it’s a proposal that is entirely incumbent upon CAs to implement properly. As the previous post considered, CAs routinely and comprehensively fail to implement the necessary security protections, which leads to them misissuing certificates. It’s not something that can be detected by a client who validates the certificate, as there is no way to know whether or not the entropy was random. It doesn’t set standards for what the entropy source is either, such that a CA that actively wanted to collude with a nation-state attacker could, for example, chose to use something like DUAL_EC_DRBG as the entropy source. If they did, they would be creating a cryptographic back door that would allow some parties to determine the state of the random number generator, and thus predict what the chosen-prefix would be. The attacker — or anyone else who found or discovered the backdoor — could then intercept secure communication for large portions of the Internet, practically (though not technically) undetectably.
This really gets to the crux of the problem with LV — its only security control relies entirely on CAs properly implementing it, as soon as possible so as to minimize disruption, and fails to acknowledge that CAs routinely fail to implement the necessary security controls, or that when such controls fails, billions of users are put at risk.
Stamos and Prince present it as a matter of finding a solution for the extremely old and outdated clients, and that leaving behind the fraction of users is an unacceptable trade-off, but in doing so, they propose a solution that presents risk to the billions of users. While the topic of path building was only lightly addressed, due to its technical complexity and nuance, it is perhaps the core of the problem: LV presupposes clients, whether they be users’ browsers or the backend systems that servers communicate with, can safely disable SHA-1, without any risk or consequence to compatibility or operation. They fail to understand or acknowledge the path building problem, or the fact that billions of users still trust MD5 signatures in certificates because of it. The only thing protecting these billions of users is that no CA is permitted to issue them — in effect, relying on hopes, prayers, and the good nature and technical abilities of CAs.
Further, the proposal introduces procedural barriers that accomplish no security benefits. These procedural barriers no doubt appeal to CAs, which can use them to justify a high premium, much like SGC certificates. While Prince and Stamos propose LV as a type of certificate intended to help the downtrodden, they ignore the years of context for which CAs are clamoring to sell such certificates not to mainstream sites, but to enterprises and internal customers, whose implementation and controls are unquestionably woefully inadequate for the risk presented.
The entirety of the proposed mitigation hinges on entropy in serial, which is one of those things that was painfully obvious as necessary in 2010, but for which CAs were still struggling to implement in 2015. If that fails, whether to be implemented at all or implemented securely, attackers can mount successful attacks, against any domain, and with more or less total impunity.
While CloudFlare, Facebook, and Twitter have tried to present this as a battle between serving the impractically ideological needs of modern users buying new phones and new computers every year versus that of the economic and social underdog, the proposal is effectively asking the 97% to bear the risks of the 3%, and with the only mitigation being a belief that this time, despite over a decade of failure, CAs won’t screw up.
Worse still, the argument is made without supporting data or review; that is, the presumed conclusion is that there is nothing to be done for these users but to accept the risk, without exploring why the problem exists in the first place. CloudFlare presents the problem as old Android devices and feature phones, yet Android supported SHA-256 since the first public release. What issues existed were not a matter of algorithm, but of path building — yet the two are lumped in the same. The only data that’s been publicly shared has not been from the proponents of LV, but from those opposed; Peter Bowen of Amazon Trust Services notes that, from data he’s both seen and shared, many users who had trouble with SHA-256 were not because the client device didn’t support it, but because there were one or more network-level intermediates disrupting the connection. This could be anything from antivirus to corporate firewall to state-level attack, but such data radically changes the conclusions, in that it suggests rather than needing to update millions of users of devices, we may be talking on the order of hundreds or thousands of targeted enterprises. The arguments from CloudFlare, Twitter, and Facebook don’t provide the data necessary to support the conclusions they make — for example, it’s unclear whether the 3%–7% quoted are completely unable to access these sites, or only partially unable due to situational and environmental factors: like being unable to access while at work, but having no problems at home.
When evaluating Legacy Verified certificates, it’s necessary to keep in mind that “extraordinary claims require extraordinary proof” — and so too should calls for extraordinary risk. LV, as a whole, fails to mitigate against this risk by ignoring the historic context surrounding it, understates the risk posed, and underestimates the serious technical complexity faced by any and all applications that wish to validate certificates and avoid LV’s risks. While understandably it comes from an earnest desire to find a solution, it fails to explore the alternatives and fails to mitigate the incredible risks, and so ultimately needs to be rejected as unsafe at any speed.
Perhaps the saddest part of this proposal is that it was necessary for CloudFlare, Facebook, and Twitter to make it. That is, in the beginning of the Web PKI, the CA ecosystem wasn’t the profit driven sales and marketing machines they have become, but actually had solid engineering and were concerned about designing solutions that actually enhanced security, rather than giving the appearance of it. It is CAs that should have the technical knowledge and expertise to recognize not only the flaws in this proposal, but also the ways in which it could be bolstered, limiting the risks and potential negative impact. Unfortunately, today’s CAs are largely a shadow of themselves; only a few invest in solid engineering and have the technical know-how to design a solution that balances these tradeoffs.
In the next post, I hope to explore what steps can be taken in order to find a solution, as well as examine the other solutions and why they too fail. While it’s easy to throw an idea out and say “something should be done,” with little more thought than an idea sketched on the back of a napkin, it requires much more discipline, care, understanding, and data to actually find a path that balances the risks and encourages, rather than undermines, security. I also hope to look at what’s needed of Facebook, CloudFlare, and Twitter — more than just grandstanding and press releases, the actual data necessary to make informed and calculated assessments of the risks and trade-offs, and that can help find a solution that doesn’t just foist all risk and cost onto the 97%.
Thanks again to the invaluable feedback and editing for those that reviewed this post, especially over the holiday period.