How feature flags improve our release and incident management processes

Gain more flexibility and control over your code releases

--

Source: DownloadPS

About our setup

For managing feature flags across many applications we use the open source Unleash provider which consists of unleash-server (flag management dashboard and API) and official client SDK’s for Node.js, Go, Python, Ruby and Java. We run a separate instance of unleash-server per environment.

Releases

Decouple the deployment and release of code

Feature flags allow you to decouple the deployment of code (copy of build artifact to server, restart of application) from its release to users (new code is enabled and handling traffic). This provides teams with several benefits -

  • When it is not possible for a feature to be incrementally released to users, it can be deployed using a feature flag in disabled state until complete. This avoids the pitfalls of long running feature branches by allowing the incomplete feature to be merged to master branch without preventing deployment and release of other features. This then allows the feature to undergo testing sooner in deployed environments.
  • A feature with a gradual release plan can be deployed just once and later released multiple times using a custom strategy. In applications which support multiple brands we can opt to use a custom brand strategy to gradually release a feature across multiple sites. A good starting point for thinking around potential custom strategies in your application is to review its inputs (environment variables, user tokens, API data).
  • A feature due for release outside business hours can be deployed during regular business hours and later released at the specified date and time using a scheduled release strategy. This can naturally work in reverse too to disable a feature at a specified date and time.

Widen the release capabilities amongst your team

By enabling the release of features using a dashboard we reduce our reliance on engineering to always perform this task. This allows other stakeholders such as product owners and quality assurance teams to perform release duties as needed.

Choosing between a temporary and permanent flag

Feature flags can serve a variety of purposes both short and long term. As Unleash doesn’t provide a way to group feature flags, we follow this general naming convention for clarity -

Temporary feature flags (with temp prefix, default state disabled) are intended for deploying features before they are complete and/or scheduled for release. They imply that once the feature is released, there is no reasonable use case where it should later be disabled.

Temporary feature flag

Temporary flags become technical debt within your application and should be removed directly after the feature’s release to reduce code path complexity and the size of your feature flags’ API payload. They must also be removed from use in the Unleash dashboard by clicking the Archive button in the Feature Toggles view.

For all other cases, we use permanent feature flags (with perm prefix).

Communicate changes to toggle state

As with conventional code releases, it’s important that changes to toggle state are communicated in and across teams for visibility. We post our updates to our Slack releases channel — see the Unleash docs for a Slack WebHooks integration example.

Handle your flags in tests

All tests (unit, integration, E2E) should be written to account for both the enabled and disabled state of feature flags.

This is particularly important for integration and E2E tests where the flag state can change in an environment at any time — often as testing is undergone in pre-production environments. Not doing so can result in unexpected CI build failures which halt the deployment process! When navigating to a URL in an integration or E2E test, we can append a query param containing the flag name and desired state -

This param is read upon request to override the flag state returned from the Unleash API and ensure we return the correct feature state for a given test.

Plan for failure

Having covered some of the release benefits to feature flags — what about the drawbacks?

By using an external provider such as Unleash, we introduce a new integration point that can fail and impact existing application functionality and release processes.

Unleash client SDK’s persist the latest known state to a local file on the client instance. This is served as stale data in the event the Unleash API is unavailable. While this is helpful for an existing client instance that has successfully connected at least once, we run our applications in Kubernetes where pod instances can cycle often, resulting in scenarios where the Unleash API may be unavailable on application start for a new pod.

Feature flags should therefore also be defined with a default value within your application to ensure a fallback state is used should the Unleash API be initially unavailable. This default value should also be provided when checking a flag’s state using unleash.isEnabled("exampleFeatureFlag", false). In this example, false will be the return value if exampleFeatureFlag is mistakenly not defined within the Unleash dashboard.

Incident management

Speed up your incident response times

During incidents feature flags allow us to disable a problem feature much faster than a deployment rollback. For example on the Node.js application which renders The Sydney Morning Herald, The Age, Brisbanes Times and WA Today, it currently takes -

Approx. 30 seconds for a flag toggle change to propagate vs approx. 5:00 -7:30 mins per site for a deployment rollback (sites run in parallel)

They can also provide better neutralisation of a problem feature in comparison to deployment rollbacks which may impact other valid code deployed alongside or after the problem feature.

Maintain some control over third party systems

We’ve found feature flags of particular value when integrating with third party systems which can experience issues outside our direct control. In these cases we use inverted feature flags (default state disabled) which act as a manual kill switch when service is impacted.

Inverted feature flag

We also disable costly third party JS libraries when not in use — for example A/B testing — to improve site performance.

Enhance your monitoring

For on-call engineers the Unleash dashboard is helpful in understanding the current state of an application’s features to assist in faster and more informed decision making. Using Unleash’s /internal-backstage/prometheus API endpoint, our feature flag event data is also annotated on dashboards in our monitoring service Grafana to provide context on when changes have occurred in our applications.

Summary

In closing, here are three takeaways to keep in mind -

  • Tailor your activation strategies to your application’s inputs and needs
  • Maintain good code hygiene by removing flags once no longer required
  • Plan accordingly for when your feature flag service is unavailable

Want to learn more? Check out the official Unleash docs, blog and demo — they also provide a hosted service. featureflags.io also provides a wealth of information on feature flags.

--

--