Feature Flags: Smaller, Better, Faster Software Development

Feature flags let you work smaller, better, faster, with less risk. The leverage they provide makes them a first-class engineering and development approach.

In this post, we’ll look at three aspects of feature flags. First, what they are and what they look like. Second, how their use evolves over time. Third, why it’s worth implementing them as a service.

Recently I had the opportunity to discuss feature flagging with some colleagues, perhaps because I’m a fan of using flags for online systems. After discussing of the concepts and what we might use them for, I decided to build a small demo feature flag service and client.

There were a few reasons to start with a working prototype. Seeing them in action ground things and clarifies what kind of problems feature flags solve. A relative lack of online material and engineering lore was certainly another. Perhaps more important was a sense that flags are not yet a fully accepted approach across the industry, compared the way continuous deployment and testing have become, and there seems to be a concern about using them.

The demo was a standalone API service because for context, any one of our teams could be running a few microservices, larger product groups may end up running many more, whereas implementing flag support inside a service limits use to that service. A service is a bit more work, but hugely informative in making you think about what you’re doing in terms of capabilities.

This was a lot of fun, and led me to reflect on why flag based engineering is valuable. I thought it would be useful to put down some notes and have something to reference in the future. Of all the development techniques used to increase impact and speed today, feature flags seem among the least appreciated and least understood. But, feature flags can be considered a foundational approach that affords a very wide range of use cases. It’s worth pointing out much of this post is written with the constraints microservices impose in mind.

What are Feature Flags?

First the name. There’s some notion there’s no consistent name for these things — toggles, flippers, switches, flags. I’m aware they are called Feature Toggles on Martin Fowler’s site, which has bled into Wikipedia’s entry. But I’m going with flags, because I’ve only known them to be called toggles, flippers, or switches in online material, and flags in real-world conversation.

Feature flags at their simplest, are named conditionals used to control code execution. If the flag is on, new code is executed, if the flag is off, the code is skipped. Here’s what using a class that wraps flag checking might look like:

if (features.enabled("test-flag-1")) {
println("enabled");
} else {
println("disabled");
}

Amazing!

Or maybe not. If you’re thinking, well, these are just glorified if statements, that’s understandable. You may be also concerned about code complexity, specifically cyclomatic complexity. But there is a bit more to them that justifies their use. Let’s take a quick overview of the basic properties of feature flags and their evaluation.

First, their core properties:

  • State: a feature can be in an on or off state (potentially other states, but almost everything you need to do is covered with on/off).
  • Key: a feature can be identified and used more easily with a client defined name rather than a server defined identifier.

That’s the bare minimum, and you can go a very long way with just that. Remembering that our context was looking at flags in terms of a general capability used by multiple teams across many microservices, there are some other properties we want to consider:

  • Description: features can have readable text explaining their purpose.
  • Owner: a feature can have a directly responsible individual. This is useful when dealing with multiple features. It addresses one of the main concerns with flags, which is having them lying around cluttering things up long after they’re needed.
  • Timestamps: the created and updated times of the feature itself are standard. A last active time indicating the system’s last known use of the flag is also useful as way to identify unused features.
  • Version: features by the nature of their state changes are mutable. When thinking about distributing features or have client side cache stores, a version string can be useful.
  • Labels: allows features to have owner defined tags or properties. This acts as an escape valve letting owners map their own local structures and concepts.

The next three are more advanced, moving towards using feature evaluation for product development and experiment support:

  • Options: sometimes we want to evaluate a feature in the on state and return a result instead of treating it as simply enabled. For example, we might want to return one of three possible background images, or show a result to just 1% of users. We might also want to give each option a weight, biasing the likelihood of it being returned
  • Targets: this allows returning an option to a defined set of users and not have it enabled for everyone or a random set. Targets can range from one or more named individuals, patterns such as a domain or email addresses, to a cohort/segment of users that match certain properties.
  • Namespaces: provides a grouping for features allow sets of microservices and individuals to share access by granting them access. At more ambitious project scales, and to support a multi-tenant service, a means to group and relate features together becomes useful. Namespaces also provide broader context by declaring multiple owners — it’s not much fun to find out a particular feature’s owner has actually left the company. There are more specific names like group/project/team you could apply here, but “namespace” acts as an unbounded catchall term and stays independent of your current org structure.

We’ll have more to say about Options, Targets and Namespaces in a future post. For now, it’s enough to know that there can be a lot more to a feature flag than a simple on/off state.

Feature Flags and Service Development

What immediately distinguishes flags from fixed conditionals in code, or on-startup configuration, is the ability to change their state from outside the running program. In that respect flags let you defer decisions until you see what the real system is doing.

Feature flags reinforce the benefits from small changes and continuous delivery, providing a flywheel to get you round the loop faster.

Flywheel for feature flags

If you have continuous deployment and a policy of small pull requests, you have the challenge of how to merge and ship those small changes when they are part of a bigger feature. Now, you should be looking to break big things down into smaller releases, but sometimes this can get sufficiently contorted that you end up taking the scenic route to completing a project. The natural temptation is to let functionality build up, maybe on a feature branch, just for a little bit. The next thing you know, you’re deploying thousands of lines of code. Worse, you’re doing this across multiple microservices. Flags balance this by allowing you to hide unpolished or incomplete functionality from the broad base of users, while keeping the interim development of that feature on the main line.

As an engineering technique, flags allow you to:

  • Work safely by isolating new code. Turning a dynamic flag off in production is almost always faster than a rollback, assuming you can in fact do a rollback.
  • Eliminate the need for code or feature freezes. You can continue to work and merge code, but keep that code inactive behind a flag. Features can be toggled on gracefully post-freeze.
  • Ship larger scale functionality faster. Once you define a flag you can put new code behind it, merge immediately, leave it disabled for all but a subset, and enabled for local development.
  • Remove long lived feature branches. Feature branching means ongoing merge and integration test overhead, and is a poor fit for teams using continuous deployment/integration approaches. It’s an outdated way to develop software.
  • Shed load by disabling non-critical or haywire functions. Even with a system that has circuit breaking or back pressure in place, there’s operational value in being able to “just turn the damn thing off”. These features are long lived, and housekeeping like last active checks are less useful.

Working back from production, you want to deliver small, easier to reason about changes, not dump multiple days/weeks (or heaven forbid, month’s) worth of work into a live system. But to go beyond platitudes, you need actual support to ship small amounts of code, which is where feature flags come in handy.

For whatever reason, feature flags don’t get anything like the attention other practices do, or I think, the level of attention they deserve. Tooling and library support for flags isn’t close to the level we see for continuous build and delivery systems, and there’s vastly more online literature around generic lean/agile engineering practices. I’m speculating, but I wonder if this is due to the seeming engineering triviality of glorified if statements, plus, when we do talk about flags it’s as mechanism, not process.

Feature Flags in Practice

Feature flags are useful as an engineering mechanism, but there’s more to them than risk reduction. Instead, the outsize impact flags have is on your overall development process.

Over time, how feature flags are used tends to evolve. They usually start as an engineering control mechanism to increase safety and amortize risks with new code. Then they are used to control new features — literally “feature flags”. The ability of flags to show/hide features eventually becomes known and useful outside engineering, allowing designers and product specialists to see their work in progress in a live environment. This provides a level of feedback that can’t be matched via demos or staging environments, and certainly by not any form of wireframing.

As their use expands, flags become part of how features are developed. One consequence of this is having to move beyond a global on/off state — features need more involved rollout options such as being on for staff for dog-fooding, available to a cohort of users for testing and early access, or deployed to a tunable percentage of requests as part of a canary rollout. Features may also need to be organised into groups and distributed across multiple services or tiers for larger projects, that are likely running in parallel. In some cases, feature flags goes all the way back to the user, as Chrome does with its chrome://flags screen or Etsy’s prototypes.

At this point, feature flags move beyond an engineering practice and become important to product and service development processes. We can look still further, along the paths of experiments and tiering.

To determine impact, teams will want to know what happened after a feature flag fired. They’ll want to know which users were in the feature scope, which options were served. They’ll want to correlate feature states with target metrics like lift or click-through-rates, or determine the effect on goal pages like checkout or sign-up. This makes feature flags relevant to any measurement and experiment approaches the company has in place, such as A/B tests or bandits. It’s not a complete stretch to argue that feature flags act as a beachhead for introducing these metrics where they didn’t exist already. In any case, the need to observe flags increases.

Flags can be used to manage experience levels or paid tiers for customers. This requires support for serving one of multiple allowed options and targeting specific users that belong within a tier. What’s interesting here, is that although this looks like a true everything-is-a-nail stretch, these targeting needs are very like the ones needed for running experiments. Tongue-in-cheek, the only differences are the word “cohort” is replaced with “tier”, and the state of the flag gets permanently locked to “on” :)

I hope this shows how flags can be adopted over time with greater and greater leverage. Pete Hodgson has written a good post, here, on the range of modes flags can take by attempting to categorise them, and concludes with guidance to decouple their respective management. My own take is more relaxed, I’m not as concerned features need to be sandboxed and siloed by use case. I think there’s value in a holistic approach, in much the same vein as unified logging, eventing, and data pipelines, and especially for a microservices system. Put another way, feature flags can be seen as a reductive form of online experiment where it’s worth looking for ways to exploit and evolve the approach than restrict it.

Flags as a Service

Given that feature flags have a way of becoming intrinsic to product and service development processes over time, and our context of microservices, we can identify three reasons to consider feature flagging as a service.

First, everyone solves this differently. Flag systems are often built for local needs with what’s to hand. ZooKeeper, Etcd, Consul, Databases, Redis/Memcached, Puppet, Chef, discovery services, configuration files, hell even DNS, all can, and probably at this point have, been hammered into a working flag system. This is not a criticism — I’d much, much rather have something built ad-hoc to support flags today than wait for a better tomorrow. And frankly, it works when teams are coordinating on a monolith with access to shared production state. If you need to move out of a monolith, you have to reimplement flag support per service boundary, or worse, expose the monolith’s flag technology of choice to satellite services. Aside from wasted engineering effort as each team puts something together, highly localised approaches don’t compose across team/service boundaries when you’re looking to build larger offerings — notably experiment/measurement gets very hard. If you’ve gotten to a point where flags have in fact become intrinsic to product development/experiments and the implementation is effectively private, this can become a strategy tax against going to services, regardless of need. And if you’re already in a world of microservices, you’ll want to look at a service based option from the get go to avoid the redundant engineering effort, or multiple teams accessing a technology option directly. There’s a path dependency to the choices you make.

Second, is enhancing the observability and accessibility of flag state. In a microservices world (and for monolithic or monocentric systems) there’s value in making flag state observable and programmatically accessible. As a simple example, you can correlate a state change with operational and product metrics. Features can also be made available to more than code. A service accessible flag can be used to control infrastructure such as load balancer or conditional routes in serverless apps. Some modern infrastructure supports options like traffic routing and canaries, but they consequently require teams to work up and down the stack, which can present complexity versus having one control plane. A situation where flag A is managed via the orchestration framework, flag B is managed in the build step, flag C is read out of the application’s cache, and flag D, well just hold on there while I check that for you, is not ideal. Service access to feature flags also allows human operators to more easily intervene and control system changes — this is a subtle point but allowing human engagement is an important aspect of complex systems design.

Third, incident management is simplified. A point of flags is to allow features to be turned off quickly. If the flag system is an internal bolt-on to whatever the team happens to run inside their microservice, then first level oncall is going to have to delve into that to turn it off. The chances are I am not going to toggle an item directly in your ZooKeeper or Database thing, especially if it’s being used for other functionality. Instead, you’re getting paged to turn that feature off, which is snatching defeat from the jaws of victory. On the other hand, I feel much better about a service interface or console that offers the least power needed to change the state. While it’s increasingly common to have teams take online responsibility for their services, it’s just as common to have oncall pools, or support tiers, where colleagues are taking pages for other team’s systems to spread load.

Treating flags as first class concern makes their use easier and simplifies coordination delivery across services and teams. Presenting them through a management UI or admin console are excellent first steps, and the path to leveraging features to measure impact and perform online experiments can be easier if a service is the end goal.

Conclusion

Ok! If you got this far, thanks for reading. I hope it helped lay out the broad strokes of what feature flags are and why they matter.

Small things can have a big impact, and not having those things holds us back in surprising ways. When Elisha Otis invented the elevator safety brake he made the development of modern skyscrapers possible, and when Keith Tantlinger invented the Twistlock, he enabled containerization, a century and a half old idea, to finally take off. Feature flags can seem like engineering trivia, but they are not. It’s impressive how much they offer, not just to engineers but to broader teams building online services.

In particular, linking flags with experiment infrastructure offers high leverage to online products. A rational reaction to using flags just for engineering control is to treat them as a complexity cost with the aim of reducing the number of active flags and making sure they don’t linger. When you start using them to drive experiments, managing down their number is a less obvious good. Instead you want to make them easy to use and focus on more advanced topics, such asproviding ways to allocate contribution when users are inside multiple experiments. That said, going straight from nothing to experiments is a big step, and you’re better off starting with simple on/off flags and working out from there.

In the next two posts, we’ll look at feature flag models and evaluation in more detail, and supporting features across multiple microservices.

Further Reading