Open-sourcing OnRamp: Consistent, Zero-Storage A/B Testing for Ruby

Rhoen Pruesse-Adams
Teachable
Published in
5 min readMay 6, 2020
Napoleon - Dynamite 20th Century Fox

We are Rhoen and Michael, engineers on the Growth Team at Teachable. Part of our responsibility on Growth is to measure engagement with the platform and make data-driven decisions to increase positive user interactions and decrease negative interactions.

As we began A/B testing, we developed a tool called OnRamp to segment users into groups that would be presented with different variants of a feature. Combined with other analytics software for recording and graphing events, OnRamp allows us to easily test our hypotheses and see the impact of our work. This lets us know we’re making changes that move the needle in the right direction for our users.

Today we are releasing OnRamp on Github as a Ruby gem and open source project. We hope you will take the time to consider if OnRamp could be useful in your own projects.

Our Requirements

The basic problem facing our team was how to experiment with different features for groups of users, and track how those users behave because of the changes we made. Many analytics tools also offer the ability to A/B test, and the ones that we currently use in-house offer A/B testing out of the box on the client.

While we ran a few worthwhile experiments with these tools, we didn’t think how they worked met our needs for the long-term. First, we wanted to avoid redirecting users and slowing down page performance, which we know can also have a significant impact on user behavior both from internal experiments and industry data. Also, in our case, not only did we want to access variant data for experiments on the server, we also use a custom event streaming setup that would be difficult to integrate with third-party A/B testing software. Last, we wanted to avoid any kinds of network calls or online storage of user-segment data to avoid A/B testing ever slowing down the app or making it unreliable — the mechanics of running an A/B test should never be a reason for a bad user experience.

So How Does OnRamp Work?

The basic idea is to pre-define a configuration with percentages of users that should be labeled with different variants. Usually we label these groups “control” and “variant.” In a basic case, the percentages would be split an even 50/50. OnRamp takes in a unique identifier for the user (it could be the same id that you use in a database) and determines which bucket that user belongs to. Over many users, you will find the percentages you defined begin to emerge. To achieve this result, the gem runs a hash function over a string made up of the experiment name and the provided unique id.

Hashing and Why OnRamp Uses It

Hash functions take a string input and convert it to a compressed, fixed-length value that represents the original input. Hashing is useful because you get the same output every time you run the function with the same input. It’s typically used for database indexing and fast lookups (O(1) insert and lookup), as well as for cryptographic purposes.

To demonstrate how we landed on hashing to manage user bucketing, let’s consider an approach that would not meet most teams’ needs. Say we decided that any user with an even-numbered user_id should receive the control experience, and those with odd-numbered user_ids should receive the variant. This could work, but it means that users would always receive the same experience (control or variant) across all experiments; unfortunately, one group of users would become accustomed to always receiving experimental changes, spoiling the assumption of independence necessary for A/B testing. Another problem: what if we wish to have more than two variants — splitting ids into even and odd groups would not be able to handle that scenario.

However, hashing a string made up of the experiment name and the user id achieves all requirements: a fast way to bucket users independently and randomly on every experiment while ensuring even exposure to each variant, regardless of how much traffic the experiment receives.

Here’s a quick-ish breakdown on how we bucket users using hashing and weighting:

OnRamp uses the Ruby class Digest::MD5 to compute a hash. As mentioned, we pass as arguments several inputs to OnRamp.ab_variant(): unique_id, experiment_name, and version (optional). OnRamp then generates the hash using the concatenated string, “#{unique_id}-#{experiment_name}-#{version}”. We take the first 8 characters of the hash and convert them to a float, which we divide by 0xFFFFFFFF, a 32-bit representation of a hexadecimal integer constant. At this point, we have our weight, which we use to determine in which bucket a user falls for a given experiment.

Here’s our hashing function line by line:

Ramping Up a Feature

One common technique that online services use at scale to bring more safety to code deployments is to slowly ramp up traffic. However, this presents a problem when you’re A/B testing in a multivariate experiment. Consider an initial A/B ramp that for building confidence, does not expose all users to the final experience initially: 70% off, 15% version A, 15% version B. Later, the A/B experiment is ramped up to the intended final experimental percentages: 34% off, 33% version A, 33% version B.

Many users who were previously seeing version B are now seeing version A. This fluctuation would be a poor user experience and invalidates your results, as some users have seen multiple versions of the feature.

To avoid this common pitfall in A/B test configuration, we wrap A/B test configuration percentages inside of a ramp up percentage check that lives in a different hash space. The flow is:

  1. First determine if a user_id is eligible for the experiment using the ramp up percentage
  2. Then, if eligible, determine which variant to bucket the user in using the A/B variant percentages

The final A/B experimental percentages — 34% off, 33% version A, 33% version B — are set at the beginning and never modified. This enables safe percentage rollouts while preventing users from bouncing between versions.

Consuming User Segments

As for implementing A/B testing in our application, for front-end experiments, we expose an endpoint that allows our web client to find out which variant a user belongs to and display the correct experience accordingly. For server-side experiments, the code can simply call the gem directly from within our Rails app and branch accordingly.

To track which users are bucketed into which variants, the funnels that we build to measure results of experiments already handle linking together events based on a user id. We simply add an additional event called “entered_experiment” with the name of the experiment and the name of the group that user is in (control/variant). We are then able to compare the two funnels against each other to measure the impact.

We’ve already successfully used the OnRamp gem to run several experiments, with more in the pipeline. We will continue to use OnRamp for our experiments in the future, and if you are interested in running experiments in your own application, we encourage you to check out the library and let us know if you have any questions.

Happy experimenting!

--

--