Cohort-based A/B testing using Unleash and Redis
As a consumer internet company going through rapid growth, we face many challenges. Some of these challenges are ambiguous, and sometimes even contradictory to each other. Such as the challenge of building a robust backend to handle transaction processing at scale. We don’t want to lose a single transaction due to our inability to process it. At the same time, we also want to keep nefarious actors away by building a fraud detection engine. We don’t want a single bad transaction to go through.
There’s also the challenge of Finding and Hiring smart engineers and giving them the freedom to work, while also ensuring we have a uniform standard in terms of engineering hygiene.
Challenge #3? Ensuring an intuitive, friction-free, and customised onboarding experience for our user. As we look forward to crossing a double-digit million mark, our user base is getting more and more diverse. From devices, to demographics, to networks, and behavioural preference, our users differ from each other on a host of parameters.
We believe that our ability to manage this heterogeneous user base is contingent on our ability to build an onboarding flow that can identify this cohort of users and fine-tune our UX for them. We achieve this through conducting A/B Tests. So we spend a lot of engineering bandwidth on tooling our A/B testing system.
Unfortunately, our team reached a point where we realised that all existing tools, even at the extent of firebase, don’t really give enough flexibility to switch between variants for a fixed set of users, and stickiness after creating these cohorts is also a real challenge.
So we began our hunt for tools to solve this problem, but we constantly hit roadblocks. The tools were either subpar and didn’t fulfil our requirements, or they were really expensive. This was making the engineering team quite restless, but thankfully, in our restlessness, we stumbled upon Unleash. And we were impressed. Unleash is an open-source tool, and the option of self-hosting was a big relief.
However, it is inherently built around the concept of feature toggles and limited experiment capabilities, and this is extremely useful for switching off features without deployment and having phased rollouts in all convenience. But again, this brought us back to our initial problem: how to handle user-cohort based experiments.
We explored the context based tooling for this purpose on Unleash, but it wasn’t exactly confidence inspiring.
So we began to brainstorm about other options, and during this conversation, we realised that a pre-existing technology we used for caching might just be the answer to our problems.
Here came our silver bullet: Redis. An in-memory database which is battle tested on our platform and completely reliable, with amazing latency metrics and more importantly, a reliable ttl (time to live) mechanism.
We have used Spring boot with Java as our primary backend framework, and it comes with a great scope of modularity in the structure. So we took it upon ourselves to write a simple wrapper around the Unleash service, and this gave us a better idea. We standardised our entire feature payload to support any type of payload that may be required in different scenarios. Also adding a stickiness enabled field with a ttl in hours.
This came in as a boon, because we soon realised that we don’t need to always run experiments for weeks or months together. Most experiments span less than 7 days and we used simple get and set queries with a ttl to do this and this just worked, without impact on performance at any scale.
This was a simple implementation of read through cache and became a go-to tool for the folks in our product and growth teams. At any given point in time, we have at least a dozen experiments running, and cohorts spanning lakhs of users each.
To work on such interesting problems, and being hell bent to always optimise on every ms of latency, come join us.
Written by — Rohan George — Backend Engineer, Jar