Android A/B testing Made Simple With Cadabra Library
A/B testing is not easy: designing an experiment and analyzing data properly is a massive task, and technical implementation adds even more complexity to the mix. In this article, I will show how to streamline that last part of the A/B testing process on Android to focus on what matters the most — your data.
A/B testing complexity
Let’s start with the complexity associated with A/B testing before I offer the solution. So, what are the problems?
- Async communications — the configuration of the experiment is often located at the server so the client needs to fetch it first, and this operation is asynchronous. That brings many questions: how do we show the UI? Do we need to wait? How do we check the latest configuration is loaded?
- Multi-layer changes — with most modern apps adopting one or another form of layered architecture an A/B test touches almost every layer in the app: it needs to be received from the network, stored somewhere in persisted storage, tracked on the domain-logic layer, and displayed through UI layer. That’s too many places to make a mistake (e.g see this article).
- Multi-experiment tracking — with more and more experiments added to the app it’s harder to track which experiments are present at all, which can be active at the time, and whether they interfere with each other.
- Sunsetting — after the experiment is over one needs to go and clean up the code from all the layers affected at the previous step. Things like configuration models and resources often get forgotten along the way and stay in the code base for months after they aren’t needed anymore.
- Boilerplate — UI changes may be as simple as a single button’s color change or as complex as a whole new layout, in both cases at least one if-else/switch-case is required. And all should be done in a way they won’t get lost during sunsetting.
Solution requirements
If we were given enough time to design a complete solution to the aforementioned problems how would it look like? Here is my proposed library requirements. (Scroll to the last paragraph to get straight to the code samples)
- Fully synchronous experiment configuration fetching — most of the tools that define the cohorts for A/B not making the decision in real-time, and not delivering it instantaneously either, there is no point in trying to get the current state of the server-side config: whatever state was actual at the moment of app launch should be the state of the A/B test deeper in the app flow (BTW recommended config fetching interval for Firebase is 12h). That doesn’t just reduce the bandwidth usage and increases responsiveness, it eliminates tricky bugs caused by fetching the updated config while one or more experiments have already been shown to a user.
- Ability to enable the experiment locally — A and B groups for an experiment should be defined in advance and stay intact (that’s the best way to keep them truly independently randomized) but the experiment itself may not be available yet to a user until a certain event happens. It’s then required to activate experimental UI A or B immediately after, ideally without network round-trip, or extra code that checks for experiment activation status.
- Single point of configuration — to keep track of all experiments we need to keep them registered in one place so it’s easy to see what’s active now. We’ll also need this point to set the configuration for integration and functional tests.
- Statically typed enum-like configuration—once the experiment is started the sub-parameter can not change otherwise we’ll get inconsistent data. That immutability, in turn, allows us to create a data-class with a complete set of required parameters for each experiment variant. The enum nature will prevent us from accidentally skipping one of the options within the switch or when statement. Once the experiment is over we can remove the class, and the code won’t compile until we properly remove all the places the experiment was affecting.
- Automatic resources resolving — many experiments are as simple as showing a new layout for the same data. It would be much more convenient to provide two layouts like
cart_screen_variant_compact
/cart_screen_variant_verbose
for an experiment that has variants “Compact” and “Verbose” and let the framework inflate proper layout automatically. And during the sunset phase, we can then remove all the resources with either_verbose
or_compact
suffix and be sure nothing unused is left. - Testability — we don’t want to see a random experiment’s variant during the testing phase, so there should be an easy way to launch any given variant, and it should be possible to mock the experimentation framework when needed.
Solution
With all these requirements in mind, I’ve created a library let’s see how that works (links to the library repo and maven below)!
Here is an example of the minimal A/B test configuration, for a simple test.
(I’ll cover more use-cases in a separate article, but the repo already contains the sample project, and the documentation describes the solutions for the most common cases)
A simple experiment with automatic resources resolution
All we need to do is
- define an enum that extends the
Variant
interface
enum class AutoResourceExperiment : Variant {STRANGE, CHARM}
- and register it like that
CadabraAndroid.initialize(this)
CadabraAndroid.config
.startExperiment(
AutoResourceExperiment::class,
RandomResolver(AutoResourceExperiment::class))
With auto-resources resolving layouts and strings that have suffixes _strange
or _charm
(according to the declared variants) will be automatically resolved by the special context wrapper, with the additional safety layer of defaulting to the explicitly provided resource if the desired one is not found.
val ct = cadabra.getExperimentContext(AutoResourceExperiment::class)// use `_strange` by default and `_charm` if Variant CHARM is active
showAlertDialog(
ct.getStringId(R.string.dialog_title_strange),
ct.getLayoutId(R.layout.dialog_layout_strange))
The experiment context is bound to a single experiment so it won’t accidentally pick up the wrong resource if variant “Strange” is active for another experiment, but it’s a good practice to give variant longer more specific names.
An experiment with custom data
Should the experiment require more complex parameters than resources, they can be provided as Variant fields
enum class FancyExperiment(
val screenLayout: Layout,
val screenContent: Content,
val itemsLimitPerPage: Int
) : Variant {
CHARM(Layout.WIDE, Content.COMPLETE, 10),
STRANGE(Layout.COMPACT, Content.FAVORITES, 20),
}
And later retrieved via Cadabra singleton
val aVariant = cadabra.getExperimentVariant(FancyExperiment::class)when (aVariant.screenLayout) {
WIDE -> setContentView(R.layout.activity_wide)
COMPACT -> setContentView(R.layout.activity_compact)
}
loadContent(
favoritesOnly = aVariant.screenContent == Content.FAVORITES,
numberOfItems = aVariant.itemsLimitPerPage
)
Experiments activation
For the real-life scenario, we’d need some service to control the experiments, like Firebase, which is supported out of the box
CadabraAndroid.config
// register experiments without starting
.registerExperiment(FancyExperiment::class)
// load experiments config from Firebase
.startExperimentsAsync(FirebaseConfigProvider())
FirebaseConfigProvider is part of the library, and any other services including custom in-house endpoints can be added via extensible providers API.
Note the explicit registration step, which prevents accidental starting of the experiment that’s not ready yet or has been disabled already.
Design
The key design aspects of the library
- It’s Kotlin-first but fully supports Java: all companion’s methods exported as statics, there are no coroutines or flows, and both Class and KClass parameters are supported.
- It’s lightweight and modularized: can be imported as core-only for pure Java/Kotlin modules or as an Android library to enable automatic resource resolving.
- It’s extensible: the most basic ways of resolving the active experiment’s variant are supported out of the box, including Firebase Remote Config, but custom resolvers are allowed.
- It’s testable: the entry point is designed as a minimalistic interface, not a class, so if you need to provide fake/mock implementation for tests, that can be done in a couple of lines of code. If you ever tried to mock Firebase SDK you know what I’m talking about.
That’s pretty much it. Check the repository for more examples and the documentation, and please reach out if you feel that you have a use-case that Cadabra doesn’t cover well.