Phoenix — Tinder’s Testing Platform, Part — I
Written by: Kenneth Yeh | Jinqiang Han | Juzheng Li
Data influences all of the decisions we make at Tinder. We are often faced with decisions focused on which updates to make within the Tinder experience and how we present features to our members. Experimentation plays a vital role in making these decisions.
At Tinder, we’ve built an in-house tool, Phoenix, to manage and optimize all of our experiments. We’re excited to share how we’ve done so, and this is the first part of our three-part series highlighting our Phoenix experimentation platform. First, we will cover the key concepts and design considerations. Second, we will cover Levers, our treasured mechanism for decoupling experiments from app release cycle. And last, we will discuss our Metrics System, which helps in deriving data insights and concluding experiments.
From a very high level, our end-to-end experimentation platform, Phoenix, is made up of four components, illustrated below:
- Ground Control serves as the foundation of the platform. It manages the life cycle of an experiment — from creation, to update, all the way to its eventual rollout and termination.
- Assignment is the module that determines which treatment each user will be receiving based on the experiment definition.
- The Levers System is a client integration SDK that handles, processes and renders the treatment applied to users via assignment on the client side. We will do a deep dive focused on this in our next blog post.
- The Metrics System is the heart of the platform. Throughout the experiment’s life, we collect data points across both the client and server sides. All of those events are ingested by Metrics System and are further processed and analyzed daily by a data crunching pipeline. A more detailed post focused on the Metrics System will complete this series.
Now, let’s take a good look at the setting up of an experiment.
The Many Knobs
There are many different types of experiments. When the time comes and you need to set up an experiment, you will be faced with many knobs to turn.
Which Side of Auth?
You need to decide on which side of the auth wall you’d like to conduct your experiment. Pre-auth experiments are done mostly when a user is not logged in yet. If you are testing your registration flow, for instance, that’s the type of experiment you’d employ. The experiment unit or the experiment key field to use in these cases would be a device ID.
Once there is a registered user, you can deploy post-auth experiments. The experiment unit or key field would be a user ID in this category. An inherent advantage of using a user ID is that, even if the same user switches devices, they will still be served the same experience.
Which Side of the Network?
Another decision you need to make is whether to conduct a client-side experiment or a server-side experiment. As their names imply, they inherently differ in certain key aspects. If you’re working to optimize user action flow, it would naturally be a client-side experiment; and if you’re tuning the recommendation algorithm, it would be a solid server-side experiment.
It’s also significant to note that, traditionally, every time you want to conduct a client-side experiment, you will need to pair that with an app release cycle — which may take days or sometimes weeks. Comparatively, most engineering teams would be able to handle a server-side module release and carry out a new server-side experiment in a more timely fashion. Thanks to Phoenix Levers System, we are able to close that gap in timing. We will share more about that in our next post.
When setting up an experiment, it’s also important to decide whether the assignment should be sticky — or remain active when conditions change — or not. This matters when the user changes some of their attributes, like switching devices or traveling to different regions. Making an experiment sticky will maintain the experiment treatment even if factors like these change.
Being sticky comes in handy for some use cases when it is important to have a consistent user experience. If an experience is geo-fenced, though, a user’s qualification should be tightly guarded by real-time attributes and you should leave the sticky box unchecked.
It is interesting to note that all those above settings are orthogonal: you can have a non-sticky client-side pre-auth experiment and you can have a sticky server-side post-auth experiment.
We are Family
It is complicated to set up your experiment. And it gets even more complicated when there are multiple experiments — especially when two or more experiments may have competing effects. This could pose a grave danger of rendering experiment analysis ineffective. For example, what if the same user is exposed to two competing experiments? How will you attribute the gain or drop of some key metrics?
To guarantee that no user is exposed to two competing experiments, Phoenix offers a concept called an experiment family. This gathers competing experiments into the same experiment family. And within the same experiment family, we employ a tie-breaker when two competing experiments are targeting the same user. As a result, the user is only assigned to one of them. At Tinder, we often use priority as a tie-breaker.
Hashing your assignments
When all the dust is settled, you may be left wondering how the treatments received by each user are ultimately decided. The key mechanism here is hashing. We hash the user identifier, whether that is the device ID or user ID, and use the hash and modded number to decide the specific treatment allocation.
Solely using a user identifier means that the same user will have the same hashed and modded numbers, and may be assigned to the control treatment for many different experiments (it is common for most experiments to have a control and it is often the first treatment out of the many possibilities). To solve this issue, we throw in the experiment ID as a salt, which is derived from a combination of the experiment family and the experiment itself.
Another potential problem with this combined salt and identifier approach is that, if we want to shuffle the treatment allocation for any reason (e.g.: a bug, re-start, or second version of the experiment), we won’t be able to do it. To solve this issue, we added a shuffle ID.
Experiment or Feature Gating
Lastly, quite often people will use an experiment to achieve the goal of gating a feature. There are some fundamental differences between the two: feature gating serves as more of a CONTROL and experimentation serves as more of a COMPARE.
Often, we only release a feature to certain users — like those in a list of selected test countries. We call this feature gating. If you are in a qualified country, you should have that experience; if not, however, you should not see it. When releasing a feature behind a gate, the focus should be the targeting rules that establish whether a user qualifies to see the feature. Based on this, you CONTROL the experience.
Experimentation, however, operates on a different level with a different focus. The main idea of conducting an experiment is that you want to COMPARE things. You want to see how A is different from B; how enabled is better / worse than control. How new is different from old. With experimentation, you can add targeting rules (just like you did for feature gating), but users that qualify the using these rules will be further split into different treatments for the sake of comparison.
Till Next Time
Tinder is about creating a new spark. Experimentation is about iteration. Phoenix is the magical creature born and reborn into a better self through fire. We hope that through rapid experimentation, Phoenix can help Tinder users achieve stronger matches, engage more effectively, and create meaningful relationships, at a truly global scale.