Deploying Features Under Cover of Darkness

How we build, test, and release features secretly and safely

Effective software development often hinges on the ability to produce small, manageable chunks of work that are released over time. This decreases cognitive load, enhances maintainability, and usually results in a more stable codebase and application. Every now and then, however, there is a desire—whether due to business requirements or the nature of the work—to keep something under wraps until it reaches critical mass.

If you’re a user of JSTOR, you’ve probably noticed our recently-updated visual design. We revealed this change to users all at once, but as with any “overnight” phenomenon, there were months of thought and work behind it. We work iteratively as much as possible, but this project required a more traditional release approach. We’re fortunate to have a suite of tools at our disposal that enables us to work in both modes as needed. Each of these tools warrants its own full-length blog post, but I’ll speak to them at a high level below.

Feature flags

To turn the new design on with the flip of a switch, we needed to provide ourselves a single, high-level entry point to that functionality. A common but perhaps underused method of delivering features is through an approach often called “feature flags.” Feature flags are essentially on/off switches that expose certain behavior or functionality to users. We make liberal use of feature flags at JSTOR in our iterative work, but they also come in handy for use as a gateway to larger sets of features.

The Dashboard

The basis for our feature flag work is a system we created long ago that we simply refer to as The Dashboard. The Dashboard exposes configurations that we can update on-the-fly for specific applications or the entire system. Unfortunately, updating a value in The Dashboard makes it visible to all users. Turning on a feature for testing meant any user could see that feature while it was active. We needed a way to turn things on and off on a per-user basis to limit risk and to learn. That’s when Flag Registry was born.

The Dashboard

Flag Registry

One kind of configuration in The Dashboard is tantalizingly close to what we were looking for—essentially just a boolean switch with a name. That’s what feature flags are at their most basic level. But how could we build in the per-user flexibility we needed?

Django exposes an API for interacting with signed cookies. Signed cookies are encrypted cookies which, in our case, contain information from the Django application along with the timestamp of generation. These signed cookies let us carry over configuration information across page requests while obscuring the internals of the feature flag system from the client side.

We built a user interface that allows us to interact with Flag models. Each Flag instance is ultimately backed by a switch in The Dashboard, but can be customized per browser session via the interface. If the switch for a Flag is off, we can use Flag Registry to turn it on for ourselves temporarily. Once the feature triggered by the Flag is tested and verified by QA, we can turn the switch on for everyone and then continue using the Flag to turn the feature off. When turning the feature off is no longer necessary, it’s as simple as deleting the Flag instance.

Flag Registry—each flag has a corresponding switch in The Dashboard

The next step in building Flag Registry was to integrate this system with our Django codebase. We did this by exposing feature flag state in middleware and templatetags so that we could query configuration at any level of the application. Switching between DOM structures has become as simple as:

{% hideif my_precious %}
{% endhideif %}
{% showif my_precious %}
{% endshowif %}

Template Delivery

Though the intent of the JSTOR design update was meant to be purely visual, some of the work required small changes to page layouts and content for proper styling. Updates were needed in the HTML that just couldn’t work in old and new versions of the back-end code. On larger pages, these small changes amounted to rather large change sets that made working in single templates difficult. Django had no native way to easily switch between different template versions, so we built one.

In a similar fashion to our feature flags, we needed a way to switch template versions on a per-user basis. We exposed this in an analogous way on the front-end, using a simple user interface and signed cookie-based system. We can look at template changes without affecting other people, and then turn those changes on for everyone with a simple toggle when the time comes.

Template Delivery

Traffic Cop

A visual redesign is the kind of change that is great to put in front of real people before release to see reactions and gauge usability. It’s useful in general to release features to a representative or interesting sample of our full audience. All sorts of finagling can make this happen, but we wanted a system that was reusable with fairly push-button operation. We built in some ways of addressing traffic that now allow us to perform A/B or even more complex types of testing:

  1. Percentage of traffic
    This approach allows us to specify a percentage of our full traffic that will receive a particular experience. Users are binned in a repeatable way so they aren’t tossed between experiences each time they load a page. We can increase this percentage easily if we see a favorable response.
  2. Segmented traffic
    Based on particular attributes of each request we can determine whether a user can be assigned to a particular category, or segment. We can then specify a mapping of these segments to different experiences. This aids us in seeing how different segments might react to different changes such as copy updates and any region-specific features. We can use this to test new features at specific institutions or with users who, for example, have a particular type of paid subscription on our platform.
Traffic Cop segment creation

At request time, a user is segmented in Django middleware such that we can query this information at the view and template level. We can then easily serve different experiences to our reference customers, monitor canary testing, and conduct A/B/etc. experiments.

With this set of tools, we’ve been able to tackle a variety of complex testing and deployment situations. Coordination presents room for miscommunication and human error, but these tools allow us to stage many moving pieces and configure them in tandem. We’re constantly looking for ways to improve and would love to hear your experiences in shipping features iteratively!