Doing Science On The Web

This post is about vendor prefixes, why they didn’t work, and why it’s toxic not to be able to launch experimental features. But mostly this post is about what to do about it. The argument and implications require nuance and long-term thinking. That is to say, despite diligent efforts to clarify and revise, this post is likely to be misunderstood.

Vendor prefixes are a very sore topic, and one where I’ve disagreed with the overwhelming consensus. In the heat of the ‘11–12 debate (a.k.a. “prefixpocalypse”) I tried to outline a rough hierarchy of the web platform’s concerns:

  1. Meeting developer & user experience needs with new features
  2. Eventual interoperability for successful features
  3. Minimizing harm to the ecosystem from experiments-gone-wrong

The debate and subsequent (conflicting) prohibitions & advice centered on the third point: minimizing pollution.

Recall that in 2012, Google, Apple, Blackberry, and a host of other vendors were all shipping browsers based on a single CSS engine (WebKit) without changing the `-webkit-*` prefixes to be vendor-specific. Instead, a large proportion of the web’s users experienced premature compatibility for experimental features. Developers could get the benefits of broad feature support without a corresponding standard. This backed non-WebKit-based browsers into a terrible choice: “camp” on the other vendor’s prefixed behavior to render content for their users or suffer a loss of users and developer loyalty.

This illustrates what happens when experiments inadvertently become critical infrastructure. It has happened before. Over, and over, and over again.

Prefixes were supposed to allow experimentation while discouraging misuse, but in practice they don’t. Prefixes “look” ugly and the thought was that ugliness — combined with an aversion to proprietary gunk by web developers — would cause sites to cease using them once standards are in place and browsers implement. But that’s not what happens.

Useful features that live a long time in the “experimental” phase tend to get “burned in”, particularly if the browsers supporting them are widely used. Breaking existing content is the third rail for browsers; all of their product instincts and incentives keep them from doing it, even if the breakage comes from retracting proprietary features. This means that many prefixed properties continue to work long after standard versions are added. Likewise, sites and pages that work with prefixes are all-too-easy for web developers to write and abandon. It’s unsettling to remove a prefix when you might break a user with an old browser. Maintenance of both sites and browsers rarely subtracts, but the theory of prefixes hinges on subtraction.

Everyone who uses prefixes, both browser engineers and web developers, start down the path thinking they’ll stop at some point. But for predictable reasons, that isn’t what happens. Good intentions are not an effective prophylactic. Not for web developers or browser makers (to say nothing of amorous teens).

This situation is the natural consequence for platform/developer time-scales that are out of sync. Browsers move more slowly than sites (at the micro scale), but sites must contend with huge browser diversity and are therefore much more conservative about removing “working” code than browser engineers expected.

Now What?

Years after Prefixpocalypse everyone who works on a browser understands that prefixes haven’t succeeded in minimizing harm, yet vendors proudly announce new prefixed features and developers blithely (ab)use them. Clearly, a need for new features trumps interoperability and pollution concerns. This is natural and, perhaps even healthy. A static web, one which doesn’t do more to make lives better is one that doesn’t deserve to thrive and grow. In technology as in life there is no stasis, only various speeds of growth or decay.

Browsers could stop prefix ecosystem pollution from happening by simply vowing not to add features. This neatly analyses the problem (some experiments don’t work out, and some get out of hand) and proposes a solution (no experimentation), but as H.L. Mencken famously wrote:

…there is always a well-known solution to every human problem — neat, plausible, and wrong.

We have already run a natural experiment in this area. At the low point after the first browser war, Microsoft (temporarily) shrink from the challenge of building the web into a platform. Meanwhile IE 6’s momentum assured its place as the boat-anchor-browser. Between 2002 and 2006, the web (roughly) didn’t add any new features. Was that better? Not hardly. I’m glad to be done with 9-table-cell image hacks to accomplish rounded corners. Not all change is progress, but without change there is no progress.

Or, put better by W3C Memes:

“One does not simply ship no new features for a year and remain competitive”

We do need new features, and we’d like good versions of them — fewer document.alls, WebSQLs and AppCaches, thanks.

We know from experience developing software of all kinds that more iteration yields better results. Experimentation, chances to learn, and opportunities to try alternatives are what separate good ideas from great products. Members of the Google Gears team report they considered building something like Service Workers. Instead they built an AppCache style system which didn’t work in all the ways AppCache didn’t work (which they couldn’t have known at the time). It shouldn’t have taken 6+ years to course-correct. We need to be able to experiment and iterate. Now that we understand the problems with prefixes, we need another mechanism.

Experiments That Stay Experiments

Prefixpocalypse happened because experiments escaped the lab. Wide-scale use of experimental properties isn’t healthy. Because prefixed properties were available to any site (not matter how large), it was straightforward for the killer combination of broad browser support and major site usage to ensure that compatibility would work against ever ending the experiment. The key to doing better, then, is to limit the size of the experimental population.

The way prefixes were run was like making a new drug available over the counter as soon as a promising early trial was conducted, skipping animal, human, and large-scale clinical trials. Of course that would be ludicrous; “first do no harm” requires starting with a small population, showing efficacy, gathering data about side-effects, and iterating.

In the web platform, the missing ingredient has been the ability to limit the experimental population. Experiments can run for fixed duration without fear of breaking the web if we can be sure that they never imperiled the whole web in the first place. Short duration and small, committed test populations allow for more iteration which should, in the end, lead to better features. The web developer feedback needs to be the most important voice in the standards process, and we’ll never get there until there’s more ability for web developers to participate in feature evolution. Experimental outcomes are ammo for the standards development process; in the best-case they can provide good evidence that a feature is both needed and well-designed.

Putting evidence at the core of web feature and standards development is a 180° change from the current M.O., but one we sorely need.

So how do we get there?

Some mechanisms I’ve thought through and rejected (with reasons):

  • “Just have users flip things in about:flags”
    This has several persistent downsides: first, it doesn’t limit the size of the experimental population. If every site encourages users to flip a particular flag, odds are enough users will do so to set usage above a red-line threshold.
  • “Enable it by default on your Beta/Dev channel browser”
    Like the flag-flipping mechanism, it puts a burden on users which is perhaps the wrong place to put it. Experimentation of this sort is likely to get better feedback when developers can work with experimental features without the additional friction of asking users to use different browsers.

The Chrome Team has been thinking about this problem for the past several years, including conversations with other vendors, and those ideas have congealed into a few interlocking mechanisms that haven’t been rejected:

  1. Developer registration & usage keys.
    A large part of the reason it’s difficult to change developer behavior about use of experimental features is that it’s hard to find them! Who would you call to talk about use of some prefixed CSS thing on I don’t know either. Having an open communication channel is critical to learning how features are working (or not) in the real world. To that end, new experimental features will be tied to specific origins using keys vended by a developer program; sites supply the keys to the browser through header/meta tags, enabling the features dynamically. Registration for the program will probably require giving a (valid) email address and agreeing to answer survey questions about experimental features. Because of auto-self-destruct (see below), there’s less worry that these experiments will be abused to provide proprietary features to “preferred” origins. Public dashboards of running experiments and users will ensure transparency to this effect.
  2. Global usage caps.
    The Blink project generally uses a ~0.03% usage threshold to decide if it’s plausible to remove a feature. Experimenters might use our Use Counter infrastructure and RAPPOR to monitor use. Any feature that breaches this threshold can automatically close the experiment to new users and, if any individual user goes above ~0.01% (global) use, a config update can be pushed to throttle use on that site.
  3. Feature auto-self-destruct.
    Experimental features should be backed by a process that’s trying to learn. To enable this, we’re going to ensure that each version of an experimental feature auto-self-destructs, tentatively set at 12–18 weeks per experiment. New iterations which are designed to test some theory can be launched once an experiment has finished (but must have some API or semantic difference, preferably breaking). Sites that want to opt into the next experiment and were part of a previous group will be asked survey questions in the key-update process (which is probably going to be a requirement for access to future experimental versions). Experiments can overlap to provide continuity for end-users who are willing to move to the next-best-guess and provide feedback.

We’re also going to work to ensure that the surfaced APIs are done in a responsible way, including feature-detection where possible. These properties add up to a solution that gives us confidence that we can create Ctrl-Z for web features without damaging users or sites.

In discussions with our friends in the community and at other browser vendors we’ve thought through alternative ways to throttle or shrink the experimental population: randomness in API names, limiting APIs to postMessage style calling, or shortening experiment lifetimes. As Chrome is going first, we’ll be iterating on the experimental framework to try to strike the right balance that allows enough use to learn from but not so much that we inadvertently commit to an API. We'll also be sharing what we learn.

My hope is that other browsers implement similar programs and, as a corollary, cease use of prefixes. If they do, I can imagine many future areas for collaboration on developing and running these experiments. That said, it’s desirable to for different browsers to be trying different designs; we learn more through diversity than premature monoculture.

Moving faster and building better features don’t have to be in tension; we can do better. It’s time to try.

Thanks to Owen Campbell-Moore, Joe Medley, Jeff Yasskin, Adrian Bateman, Jake Archibald, Ian Clelland, Michael Stillwell, Addy Osmani, and Chris Wilson, and Paul Irish for their invaluable feedback on drafts of this post.

Cross-posted from my personal blog.