Mastering Feature Flags: Testing Feature Flags

Martin Chaov
DraftKings Engineering
6 min readFeb 28, 2024
Testing feature flags according to DALLe3

This article is part of a series; if you got here first, it might be worth it to check some of the previous ones:

Feature flags serve as a mechanism in software development for managing features in a controlled environment. This article delineates a structured approach for testing feature flags, encompassing planning, execution, and monitoring.

Using feature flags can substantially increase the scope and complexity of testing requirements. Each flag introduces new permutations of system states that must be rigorously tested to ensure stability and performance. This testing overhead can strain resources and extend timelines. To manage this, organizations often adopt automated testing frameworks designed to handle multiple flag states. Additionally, it’s advisable to clearly define testing scopes and objectives based on the type and purpose of each feature flag.

The primary way to keep the feature flags easy to test and maintain is to make sure their number in the system is kept to the bare minimum required to meet said system’s goals!

To expand on the flag types from the previous article

Testing of feature flags based on the flag type.

Planning & Designing Test Scenarios

Initial planning establishes the framework for the testing process.

Objectives and Goals

Objectives should be clearly defined. For example, Experimentation flags focus on user metrics, while Operational flags could be geared toward system performance.

  • Define the testing objectives based on the feature flag type.
  • Specify the expected behavior and features impacted.

Dependencies and Scope

Dependencies and scope require careful consideration.

  • Identify dependencies with other feature flags.
  • Determine the testing scope, including environments (test, staging, production) and platforms (web, native).

A few flags with a few possible values (just the — on/off) can explode in complexity.

The test design aims to cover all potential scenarios. Assume we have the following four flags of different types:

  • System_Release_DepositV2 — a release flag for incremental release of a new deposit page
  • User_Experimentation_ColorSchemeTest — A/B testing for the effect color has on the conversion of customers
  • System_Operational_LoginSwitch — system flag used to migrate users from to a new login provider gradually
  • User_Permission_VIPDeposit — flag providing a customized deposit experience to the VIP user segment
Possibly hidden relationships between different flags.

The test scenarios should cover all the cases from the combinations of the different possible flag values based on the assigned user segments. Consider if a customer could simultaneously have to be a part of all four flags due to participating in a few of the targeted user segments. Most of the time, tests should cover the system before the flag is enabled; thus, only the new scenarios should be developed. However, whenever flag dependencies are discovered, there should be tests related to the system behavior if the said system is misconfigured.

Test scenarios form the core of the testing strategy.

  • Clearly define the states: off, on, default, and others… including the combination of flags (which becomes the value itself) in case of dependencies.
  • Develop test scenarios for both the enabled and unwanted states. The existing test coverage should be enough to ensure the disabled state is handled.
  • Formulate scenarios for different user segments.
  • Formulate scenarios for the dependent flags.
  • Formulate cases that assess the collected data’s accuracy.
  • Scenarios for the logging and monitoring related to the new feature, especially for experiment flags — it is not uncommon for decisions to be taken based on faulty statistics.

Appropriate test data and environment configurations are essential for effective test execution, as is their proper documentation. The test data and environment configurations should be documented in the test scenarios and plan. Performance, Resilience, and security-related topics are going to be expanded in upcoming articles.

Functional and End-to-end Testing

This section examines how the feature flag interacts with the overall system. Feature flags should not be chained in more ways than “if flag X is on, flag Y is off” — most interesting cases arise from user segmentation and users in more than one segment. This is where collision detection and prioritization should be implemented.

Test execution and validation of feature flag behavior:

  • Configure the feature flag settings for each test scenario.
  • Execute the test scenarios and validate outcomes.
  • Document any anomalies or unexpected behavior.

Automation and parallelized testing can be enabled via environments configured based on various available user segments and their combinability.

Enabled and Disabled States

Testing should include both enabled and disabled states of the feature flag.

  • Conduct thorough testing when the feature is enabled.
  • Confirm that the feature is deactivated when disabled.

Boundary Cases and Dependencies

Testing should also consider edge cases and dependencies, particularly for complex flags.

  • Examine edge cases and scenarios influenced by the feature flag.
  • Involve engineers in monitoring logs and metrics.

Cross-Browser and Cross-Platform

Compatibility testing is necessary to ensure consistent behavior across different platforms.

  • Validate compatibility across various browsers and platforms.

Test Rollbacks

Rollbacks serve as a contingency measure.

  • Validate the interaction between the feature flag and other system components.
  • Ensure no conflicts arise when integrated with other functionalities.
  • Prepare a disaster recovery plan. This involves outlining steps to recover from any adverse effects caused by the feature flag.

Deployment and Release Testing

Before going live, validating the feature flag in various deployment environments is essential. This ensures a smooth transition from staging to production.

  • Validate behavior across different environments and during the release process.

Minimizing User Impact

Testing in various deployment environments is necessary for a seamless transition to production.

  • Validate behavior in different environments.
  • Implement a phased approach for enabling the feature flag in production.
  • Define specific metrics or KPIs for assessing the success of the feature flag.

Monitoring and Maintenance

Ongoing monitoring is required post-deployment.

  • Implement logging and monitoring mechanisms.
  • Monitor performance and stability continuously.

Cost-Benefit Analysis

Evaluating the return on investment (ROI) for feature flags involves assessing costs and benefits. Prices may include development, testing, and ongoing maintenance. The complexity the flag introduces, such as the need for additional test scenarios, should be accounted for. Benefits may include reduced time-to-market, variable release strategies, and system stability. The analysis informs decision-making regarding the feature flag’s long-term viability.

Version Control

Managing feature flags across different software versions requires a robust version control system. Compatibility with current and legacy versions is essential, especially for long-lived flags or those spanning multiple versions. The snapshot of the system’s current state, combining running services, configurations, feature flags and their values, and others, could be considered for a system version.

Glossary

  • Feature Flag: A toggle for turning features on or off.
  • Test Cases: Specific conditions under which a feature flag is tested.
  • Functional Testing: Verifying that the feature works as expected.
  • Integration Testing: Ensuring the feature works within the overall system.
  • User Acceptance Testing (UAT): Validation by the end-users or stakeholders.
  • Deployment Testing: Verifying feature flag behavior in different environments.
  • Monitoring: Ongoing tracking of feature flag performance and issues.
  • Release Flag: A flag used for controlling the release of new features.
  • Experimentation Flag: A flag used for A/B testing.
  • Operational Flag: A flag used for controlling operational settings.
  • Permission Flag: A flag used for controlling access to features.

This article is part of a series; if you got here first, it might be worth it to check some of the previous ones first:

Want to learn more about DraftKings’ global Engineering team and culture? Check out our Engineer Spotlights and current openings!

--

--

Martin Chaov
DraftKings Engineering

15+ years as a software architect, currently Lead Software Architect at DraftKings, specializing in large-scale systems and award-winning iGaming software.