How we saved a month of engineering time per quarter

Elaine Lin
EPD at Digit
Published in
5 min readJul 13, 2022

Replacing our in-house feature management platform with a vendor solution, LaunchDarkly.

Our previous feature management platform slowed down development. This limited how quickly we could ship new features. Digit engineers spent ~24 days per quarter overcoming limitations with the system:

  • ~18 days rolling out features to existing users
  • ~4 days fixing limitations with the platform
  • ~2 days discussing how to write feature management code

I experienced these issues myself, and I noticed recurring pain points across engineering. In this post, I’ll talk about how we saved time by replacing our in-house feature management platform with an off-the-shelf solution, LaunchDarkly.

In-house feature management

Since Digit’s inception, we managed feature flags through an in-house feature management platform. Here’s how it works:

  • When a new user signs up for Digit, we create a Digit User model in the database. The User model has a featureFlags field, which is an array of strings.
  • You can append/remove feature flags to the User model featureFlags field. To check if a feature is enabled, use the helper function canUseFeature to check if the array contains your feature string.
> user.featureFlags
[]
> user.featureFlags.push('newSavingsAlgo')
> user.canUseFeature('newSavingsAlgo')
true

This seems reasonable, but there are problems:

UI for the in-house platform
  • Rolling out a feature to existing users requires manual engineering involvement. The system stores feature flag values in the database. Suppose there are 1 million users, and you want to roll out the feature newSavingsAlgo to 10% of users (100k total). You would need to run a job to update 100k records in the database. Increasing the rollout percentage requires you to run another job. As a feature rolls out, you also need to inform everybody of the rollout status. Each feature ramps over the course of several days (e.g. 1%, 5%, 10%, etc.), so the total overhead cost is ~3 days per feature. In Q4 2021, we had 6 features that affected existing users and required a manual ramp. This adds up to 18 days of engineering time per quarter.
  • Gradual feature rollouts are tricky and prone to error. Feature rollouts lacked guardrails. You could roll out a feature without additional approval or review. This led to incidents. Once, we mistakenly added a feature flag to all users. Fixing the issue required more manual intervention. An engineer ran a job to remove the feature flag and ran another job to re-add the feature flag to the right subset. Each incident is ~3–5 days of overhead — multiple engineers are involved in troubleshooting, rolling back, and postmortem review.
  • Writing feature management code is inconsistent and requires additional engineering time. The in-house system had many ways to manage features: config variables, experiment data models, and several jobs. None of the approaches were robust or easy to use. Figuring out how to write feature management code added ~2 days of overhead per quarter.
You are now a Slack bot.

Buy vs. build

At the end of 2021, we noticed that the in-house feature management system wouldn’t scale with our needs. System issues get worse as you grow. Digit started with a handful of engineers and a savings product. Now, there are more engineers and more products. This increases the number of features we ship each quarter. More features mean more feature management overhead. To be effective, we needed more robust feature management and experimentation tools.

We could redesign the existing system, but writing code isn’t the only way of solving problems. To fix feature management, we used the buy vs. build framework. Is it better to buy an off-the-shelf solution or build a solution in-house? It can be more efficient to pay somebody instead of writing a complex system yourself. We decided to buy a platform for the following reasons:

  • Faster than fixing: Fixing the existing system would take at least 5–6 weeks for one engineer (1–2 weeks for scoping, 4 weeks for implementation). Integrating an external platform would take about 2 weeks.
  • Not our wheelhouse: At Digit, we focus on improving financial health. Feature management isn’t our area of expertise, and existing vendor solutions fit our use cases.
  • Reduced eng involvement. By buying a platform, we remove most of the eng operational overhead with feature rollouts (about 1 month per quarter). Product managers can ramp features on their own. Engineers don’t need to maintain the feature management platform.
  • Similar cost. The cost of engineering overhead (days spent x cost/day) is similar to the cost of buying a feature management platform.

After evaluating 13 different vendors, we decided on LaunchDarkly after considering:

  • Does the vendor meet our user stories, desired functionality, and security requirements?
  • How much effort is needed to integrate the vendor?
  • What is the cost of the platform?

Out with the old, in with the new

Replacing the feature management platform is a big project. It changes how engineers write feature code for all new features. It affects how we manage features for testing, feature rollout, and experiment setup. Here are the core principles we used for integrating LaunchDarkly.

1. Don’t break functionality.

  • Digit has a backend server and mobile apps for iOS and Android. Some users don’t update to the latest app version, so their devices won’t have the most up-to-date code. To address this, we wrote backward-compatible code that doesn’t break older mobile clients.
  • There are around 50 legacy feature flags and experiments in the codebase. Cleaning up all of the old code would be risky and expensive. To address this, we took a selective approach to cleanup. We deleted obvious points of confusion and renamed deprecated methods. For example, we prevented creating new experiments on the legacy platform and renamed the method canUseFeature to canUseFeatureDeprecated.

2. Make the system foolproof.

  • LaunchDarkly provides an SDK for evaluating feature flags. If you call the SDK directly, you can accidentally send PII (personally identifiable information) to the platform. To prevent this problem, we added wrappers around the SDK and prevented direct SDK imports. For example, the code looks something like this:
await Features.get({
flag: ‘newSavingsAlgo’,
userId: user.id,
})

3. Introduce the system to the team.

  • We trained engineers, product managers, and support ops to use the new system. Engineers learned how to write code in LaunchDarkly. Everyone learned how to use the LaunchDarkly UI. We worked with specific engineers to migrate ongoing projects to the new system.
  • Some folks had features that were partially rolled out on the old system. It would be disruptive to migrate these features to LaunchDarkly, so we kept them on the old system.

It’s now much easier to roll out features at Digit. LaunchDarkly sends automatic Slack messages to notify stakeholders of the rollout process. Product managers can ramp up features without engineering intervention. By improving feature management, we can deliver more features to make financial health effortless for everyone. If you are interested in joining the Digit team, check out our current openings at https://digit.co/careers.

Acknowledgments

Many thanks to Tabitha Blagdon and Michael Brew for reviewing this blog post.

--

--