Mastering Feature Flags: Risk Mitigation

Martin Chaov

Published in

DraftKings Engineering

9 min readFeb 28, 2024

Risk mitigation of Feature Flags according to DALLe3

This article is part of a series; if you got here first, it might be worth it to check some of the previous ones:

Feature flags offer a powerful way to manage software features but come with risks.

The first step in risk mitigation is identifying the potential risks. They may include code complexity, security vulnerabilities, and increased testing overhead. Risk assessment should be conducted before implementing feature flags. Moreover, the risk assessment should be part of implementing a feature behind a flag. Once they are identified, appropriate mitigation strategies should be applied. The goal is to minimize the negative impact while maximizing the benefits of using feature flags.

Code complexity

Introducing feature flags into a software system can inadvertently escalate code complexity — probably the most prominent risk. Multiple flags can interact unpredictably, leading to an exponential increase in system states. This complexity can compromise code readability and maintainability and elevate the risk of defects. To mitigate these challenges, organizations are advised to adopt best practices such as limiting the lifespan of temporary flags, employing clear naming conventions, and conducting focused code reviews. Utilizing feature flag management systems with centralized dashboards can also provide valuable oversight, enabling teams to monitor active flags, their states, and interdependencies effectively.

Let’s take as an example 4 flags, one of each type:

System_Release_DepositV2 — a release flag for incremental release of a new deposit page
User_Experimentation_ColorSchemeTest — A/B testing for the effect color has on the conversion of customers
System_Operational_LoginSwitch — system flag used to migrate users from to a new login provider gradually
User_Permission_VIPDeposit — flag providing a customized deposit experience to the VIP user segment

At first glance, these flags are not related. However, depending on the segmentation of users, one user could hit two or more of these flags at the same time. In a vast system (even a well-designed one), dynamic deviations from functional requirements could lead to unpredictable system behavior. One way to mitigate this, albeit a bit coarse, is to limit the number of flags and their dependencies for the system in question.

To take a VIP user enrolled into the A/B test and affected by the login migration and the gradual rollout of the new deposit flow. Which of these flags should take precedence? If a user can be in more than one user segment, such cases should be handled in code or configuration of the flags.

Adoption of new application versions

Sometimes, new software versions cannot be easily propagated to the end users. Things can be as easy on the web as introducing a new character in the URL. However, in native apps, the user must take action; they might not even have the Internet bandwidth required to download and install the new application to their devices. Since engineers cannot enforce instant updates, a gradual new version adoption transition must be considered when introducing or decommissioning feature flags. This fragmentation introduces additional complexity to the codebase as the engineering team must ensure compatibility across multiple application versions.

Testing overhead

Using feature flags can substantially increase the scope and complexity of testing requirements. Each flag introduces new possible permutations of the system state that must be rigorously tested to ensure stability and performance. This increased testing overhead can strain resources and extend timelines. To manage this, organizations often adopt automated testing frameworks designed to handle multiple flag states. Additionally, it’s advisable to clearly define testing scopes and objectives based on the type and purpose of each feature flag. This targeted approach can help maintain testing efficiency while thoroughly evaluating critical flag interactions. The math behind this is relatively trivial: 3 flags with two possible values each equals eight distinct system configurations. Even a few concurrent flags, like 20 with three different values (ex., one per user segment), give 3 486 784 401 permutations.

At DraftKings Inc., we have hundreds of flags with an average value of 4.2 at any given time. Testing all the possible configurations is not feasible. Thus proper mapping out of dependencies and restrictions between flags should be applied. Analysis of the user segments for intersections and flag collision is one way to find hidden dependencies between the feature flags.

Testing is expanded upon in the previous article from the series: Testing Feature Flags.

Cross-application A/B testing

In cases where the users access a system through multiple entry points (web, native, remote terminals, etc.) A/B testing becomes particularly challenging. In the majority of cases, different types of client applications have different life cycles and feature sets, making it difficult to have a consistent and controlled testing environment. Experiments must be designed with these limitations in mind or be closed down to a single type of application.

While users can expect inconsistent user experience between different mediums, accurately segmenting users across various applications for A/B testing can be complex. They can have different usage patterns or preferences in multiple mediums — checking statuses on the phone vs. doing work on the desktop vs. doing work on the phone vs. checking reports on the desktop. Cross-application A/B tests generate data from different sources with varying formats, metrics, reliability, volume, and frequency. Integrating this data for a cohesive analysis can become a significant challenge and could easily lead to misleading insights or wrong conclusions supported by numbers.

Zombie features

Residual feature flags, often termed “zombie features,” pose a risk to system integrity. These flags have outlived their utility but remain embedded in the codebase. They contribute to system bloat, complicate maintenance, and potentially introduce security vulnerabilities. Regular audits for identifying and removing these redundant flags should be performed. Updated documentation that accurately reflects the status of each flag is also crucial. Some organizations employ automated tools to identify flags not accessed or modified within a specific timeframe, signaling them for review or removal.

A typical example can be found in e-commerce platforms that frequently run seasonal promotions or limited-time offers. Once the promotion period ends, the feature flags controlling these offers may remain in the codebase, becoming a “zombie feature.” Moreover, the same flag could be re-used periodically, accumulating code dependencies over time. Some companies conduct regular audits using internal tools to identify such flags, marking them for deprecation or removal to maintain a clean codebase.

Security risks

Unauthorized access to premium features and accidental exposure to sensitive functionalities can pose a security risk. For example, if a feature flag controlling access to a new administrative tool is not properly authorized, it could be exploited to gain unauthorized access.

Mitigation strategies:

Secure coding practices: This involves validating input, sanitizing data, and ensuring default settings are available as a backup in the application code.
Encryption and secure channels: Encrypt feature flag data and use specific channels for transmitting flag changes. This prevents unauthorized interception or tampering with flag settings.
Access controls: Implement strict access controls for feature flag management. Only authorized personnel should be able to create, modify, or delete feature flags, of which there should be a company procedure to document, approve, and then record the change in an audit log.
Regular audits: Conduct security audits focusing on feature flags to identify potential vulnerabilities and ensure compliance with security policies.
Automated monitoring: Utilize automated monitoring tools to detect unusual activities related to feature flags, such as unexpected changes or access patterns.

Consider the scenario of integrating a third-party payment processing tool using feature flags. The flag controls the rollout of this new feature. If this flag is misconfigured or lacks adequate security measures, it can expose the application to financial fraud risks.

When flags are used for granular access control to new features, additional verification should be in place to confirm that the user who makes these requests is, in fact, authorized and part of the targeted audience.

Rollback plans

In software development, a rollback plan serves as a contingency measure to revert changes in the event of a failure or unintended consequences.

For instance, a financial technology company may introduce a feature flag to enable a new payment gateway. If issues arise during the rollout — such as increased transaction failures — the rollback plan is activated to turn off the feature flag, reverting the system to its previous state.

A well-structured rollback plan includes:

predefined triggers for activation,
a list of steps to execute the rollback,
a communication strategy to inform stakeholders and sometimes even the customers.

Tools like Spinnaker or Kubernetes can automate the source-code rollback process, reducing the time and manual effort required to restore the system to a previous version. Rollbacks may also be triggered based on business KPIs such as negative impact on revenue or customer satisfaction. However, rollbacks are hard with feature flags, even with well-established planning and execution processes. In the best-case scenario, only the flag value is reverted to the default “off” state, and the engineering team can troubleshoot the system safely. In the worst case, the source code should be rolled back entirely, maybe even including some back-end services. There should be a known good state of the feature flags for the previous version of the system’s source code that can be reverted to.

The line between which component is part of the version of the system becomes blurry and challenging to draw in a system with hundreds of flags and thousands of running services. The whole application code + underlying services + monitoring infrastructure + flags and their values are considered a working snapshot of the system and should be versioned accordingly. Feature flag rollouts should be treated as code changes — it is better to have minor changes at a higher frequency than a big bulk of changes with many moving parts.

Sometimes, code rollback might not even be a feasible option — native apps are much more complicated to revert to previous versions due to the requirement of the customers’ interaction.

Monitoring and alerts

Monitoring and alerts are integral to the effective management of feature flags. Consider a cloud-based CRM platform that uses feature flags to control access to a new analytics dashboard. Monitoring tools can track key performance indicators like load times, error rates, and user engagement. If these metrics fall outside predefined thresholds, alert systems can notify the development team in real time. This enables quick identification and resolution of issues, minimizing impact on the user experience. Solutions like Grafana for monitoring and PagerDuty for alerts can be integrated into the feature flag management system to provide a comprehensive overview of feature performance and system health.

Customer support impact

The implementation of feature flags can significantly affect customer support operations. The customer experience (CX) team should be adequately trained to handle queries and issues from users who are part of a segment under an active experiment. This includes being aware of the specific features or changes being tested, understanding how to troubleshoot related problems, and knowing when to escalate issues to the development team. Effective communication between the CX and engineering teams ensures customer feedback is promptly addressed and used to improve the feature flag implementation. Creating a knowledge base with FAQs and common issues related to active feature flags can also empower the CX team to provide faster and more accurate responses to customer inquiries.

Glossary

Rollback Plans: Strategies or procedures for reverting software to a previous state in case of failure or negative consequences after deploying a new feature or update.
CRM (Customer Relationship Management) Platform: Software used by businesses to manage and analyze customer interactions and data throughout the customer lifecycle.
KPIs (Key Performance Indicators): Quantifiable measures used to evaluate the success of an organization, employee, or other entity in meeting objectives for performance.