Building a high performance zero storage feature flag & A/B testing service

In this article, we’ll share some tips on how we built a new high performance (average response time 1ms) feature flag & A/B testing service with no storage.


Up until a few months ago, dailymotion had been using a basic feature rollout/feature flag service initially built 10 years ago and that had served us well. Starting in 2017, dailymotion went through a full product and infrastructure revamp. The client apps and website were rebuilt from scratch, as well as the backend architecture. Many things had to be rebuilt to accommodate the new service oriented architecture and with this came a few additional constraints:

  • ready for geo-distribution: it had to work in our many points-of-presence (POPs).
  • fully containerized: kubernetes FTW!
  • easily scalable: more traffic ? Just add kubernetes pods.
  • high performance: the service would be called to enrich each and every incoming request, no caching possible.
  • multi-environment: it needed to be used in our apps, on our website, our backend services and where ever else anyone could come up with.

What is feature rollout and A/B testing ?

Feature rollout, also called feature flipping or feature flags is a way to handle progressive and controlled feature launches. What I mean by this is that the code for a feature is in the software but its not enabled by default. At runtime, the feature rollout service will send some sort of flag that tells the program to enable the feature (or not) for a particular user or group of users.

To control and check who will or won’t be exposed to the new feature, we need a rule engine with a boolean output to target specific clients:

  • true: the feature is enabled for this client
  • false: the feature is disabled for this client

For example, the product team add a new like button to the dailymotion interface with a feature rollout flag. The gateway would pass the request context (that contains information such as IP geolocation, user ID, user agent, preferred language etc.) to the feature rollout service which in turn will pass this through its rule engine and decide if this particular user will see the new like button.

a feature flag for gravity

A/B testing is a way to do randomised tests across two or more variants to see which performs best. It is somewhat similar to feature rollout in the way that before randomising visitors into various test “buckets”, there is some filtering of who will be in the A/B test. The same rule engine we use for feature rollout can be applied as the entry filter for the A/B test. Once a user has passed through filtering and is selected to be part of an A/B test experiment, he or she is randomly assigned into a variant (test A vs test B vs control-group). Each variant will give the user a different experience and we then measure which of the two variant performs the best in relation to set of pre-determined hypothesis and metrics (KPIs).

Building the service

Now that we know what we want to build and the constraints we have, the 
key building blocks of the service are the following:

  • a rule engine
  • a method using a client-side information to consistently assign a 
     client to the same feature or A/B test variation
  • a method to synchronise all the services worldwide

Puzzle pieces

For the rule engine, we used a boolean expression tree. Therefore we can build any rule easily by combining conditions. We used this great tutorial on how to implement binary expression trees in go with a few customisations.

schema of boolean expression tree

Most feature flipping or A/B testing services we surveyed use some kind of storage to record which user has been associated with which feature flag and A/B test variant. The storage of such data is usually the pain point when building such a service: it has to be very fast, fault-tolerant, consistent and distributed. To skip this complexity, we decided to create a deterministic method to assign users to feature flags and A/B variants. We found that we just needed to add a random unique ID (see below) to each client. As we already use JWT (JSON Web Tokens) to store user sessions, we could simply add a random ID to it.

For feature flags, we always return the same boolean for a given request context because the rule engine simply applies the boolean expression tree to the content (there’s nothing random involved).

For A/B tests, this was trickier. A/B tests expect each client to be randomly assigned to a bucket.

To make the A/B test variation assignment both random and deterministic, we decided to use hashing. As mentioned above, we assign each client with a unique random ID (like a UUID). To this information, combined with the A/B test UUID, we apply this formula:

hash(experiment.UUID + client.UUID)

This operation gives us a random value that we then modulo to determine the bucket. We compared many hash algorithms and chose murmur3 for its speed, good distribution, good avalanche effects and collision resistance. For example, for a 25/75 split we modulo 10 the hash and make a simple split by above/under value 25.

Therefore, our service doesn’t need any database to store information: anything can be calculated when needed or quickly computed again.

Finally, to synchronise all the services worldwide, we use a simple shared bucket with change notification. We know there are very few writes on the 
configuration files, so each service is a master node that writes the update 
to the bucket and all the other services are notified of the change and download the update. To make sure the change of configuration is atomic, we reload the whole configuration and when it’s fully loaded and checked, we swap the global program pointer to the new rules configuration. This guarantees that there is no intermediate “half-old-rule, half-new-rule” applied during configuration reload.

Architecture

The service is called by our gateway on each request to enrich it. We send the context of the request to the service which returns the enabled feature flags and the A/B test variants. This information is added to the headers of the subsequent requests that occur in our service oriented architecture.

If a client needs the results, our GraphQL API simply forwards the information to the client that can use the feature flags or A/B test variants in client-side apps.

schema of the architecture

Takeaways

Hashing FTW !

Hashing allows us to turn most of the constraints listed at the beginning of this article into benefits. By using hashing, we have a fully stateless solution that doesn’t need any storage and that’s fast to compute.

Keep it simple

Using a simple shared flat-file across services works really well. It allows
us to have a multi-master system without any effort. Caution though: it works because there is a low write volume. On higher writes, there would be conflicts between two people changing something at the same time.

No silver bullet

The system doesn’t answer all the A/B testing requirements. For example, there’s still the need to save the A/B testing results somewhere to analyse the results and see which one worked best. Collecting and storing data for analysis is left to the A/B test service user to implement as he or she sees fit. On our side, we have a separate analytics pipeline for analysing such data.

Using Go for speed and efficiency

The service is built in Go and allows us to easily process many thousands of requests per second with a few kubernetes pods. All the rules are stored in memory and are blazing fast to execute (latest benchmark: a few hundred nanoseconds per rule) so we can have lots and lots of them and have response times around 1ms.

Building a High Performance Feature and Experimentation (A/B) Service, by Klemen Sever (lightening talk — Open R&Day 2018)