Yik Yak Configuration and Experiment System

Published in

Yik Yak Engineering

6 min readJan 31, 2017

In this article we’ll discuss the Yik Yak client and backend configuration and experiment system. This is part of a series intended to describe the system components we built to implement Yik Yak and share what worked well and what didn’t. By sharing our journey with the engineering community we hope to help other startups benefit from our findings and avoid some of the problems we encountered.

Background

I joined YikYak after the Golang transition had begun and when the experiment system I’m about to describe already existed and was in use. That said, like most fast-moving startups, experimentation wasn’t an early design requirement and the first incarnation of Yik Yak didn’t have an easy way to run experiments.

A Client Focused Start

Our product team tended to be focused on clients and client behavior — and as such the first needs for experimentation came in the form of client experiments. For example: Do users post more yaks with the prompt “Share your thoughts” or “What’s going on now?”

Let’s step back and consider the purpose for experimentation. For our system we wanted:

To answer “what’s better” product questions. This encompasses simple “experiment” vs. “control” and multi-class (a, b, c, …) tests.
To turn on or off a feature in the client for a defined set of users.
To identify key performance metrics related to an experiment.
To be able to control experiments based on a variety of characteristics. New users with no expectations likely behave differently than established users. Users with frequent activity might behave differently than casual users. And because our app is about local communities, users in a certain place might behave different than users elsewhere.
To be robust in a failure situation — ideally an experiment framework should never cause downtime!
Note that at this stage, backend experimentation was not a design goal. More on this later.

The architecture we put in place met these needs. Here’s how it worked for clients:

Clients were deployed with code to fetch an experiment payload of configuration data on a schedule. This scheme included a “fallback payload” baked into the client that could be used in a worst-case scenario, as well as a robust mechanism to use a stale configuration if the new one failed to fetch. When working correctly, clients would request a new configuration payload about every 10 minutes allowing for quick updates. However to avoid startup latency, for freshly-installed clients we would not block startup on fetch and would apply the new configuration on next startup.
Using the experiment system, clients consulted a configuration interface to determine behavior. This included boolean controls (enable a feature or not), string configuration for display text or URL configuration, integer configuration (e.g. how many posts to query), etc. Clients would need to support configurable settings, so deploying a new class of experiments took some time, and for new experimental features only the newest version of clients would respect the new experiment.

On the backend side, the server was charged with handling a configuration query, determining how to configure that client, and returning the appropriate payload.

Configurability

Underlying this system are two main configuration files. The first defines the experimental controls. Here’s an example of a control to enable the chat feature.

{
“name”: “DirectMessagingEnabled”,
“type”: “bool”,
“fallback”: “false”,
“description”: “Is the Direct Messaging feature enabled?”
},

The second file defines the logic for how to set a parameter given a configuration request. These rules are evaluated linearly, and each rule is defined using a lisp-style expression (using https://github.com/zhemao/glisp). In this fictional example, two rules are shown using a geospatial bounding box around Brazil and a locale set to “pt-BR” or “pt-PT” to enable chat and set a special URL, together with a corresponding holdback.

{
  “ID”: “EnableBrazilChat-Holdback”,
  “SelectionExpression”: “(and (or (in-area -14.1000 -54.2933 19.75 current-lat current-lng) (or (= \”pt-BR\” locale) (= \”pt-PT\” locale))) (in-sample-group \”BrazilChat\” 5.0 user-id))”,
  “Config”: {
    “DirectMessagingEnabled”: “false”,
  }
},
{
  “ID”: “EnableBrazilChat”,
  “SelectionExpression”: “(or (in-area -14.1000 -54.2933 19.75 current-lat current-lng) (or (= \”pt-BR\” locale) (= \”pt-PT\” locale)))”,
  “Config”: {
    “DirectMessagingEnabled”: “true”,
    “PrivacyPolicyUrl”: “https://www.yikyak.com/api/legal/privacy/br/pt"
  }
},

With these mechanisms you can see that powerful functionality is available in the configuration scheme. You could approach this in other ways, but to make it intuitive and avoid errors, we had the system skip any rule block where an existing configuration had already been assigned. In the above case, we’d skip the rule if a prior rule already set DirectMessagingEnabled to false. This avoided “flip-flopping” a parameter, but meant that the ordering of the rules was important and that adding configuration elements could potentially have side effects.

The backend system gets its data (such as user location, locale, etc.) from the client request and via backend lookups. For example, the user account creation date on the platform (and other info) comes from a lookup of the userid.

As built, our system can target individual users, sets of users (defined by a separate list), geospatial bounding boxes, locales, and other criteria such as user account creation date, client platform (e.g. Android), client version, etc.

Measurement

So we can control all these parameters but how do we know what’s the impact of the experiment? In this architecture the control system was generally not directly involved in impact data collection or evaluation. We used Amplitude to capture metric data and decorate it with experiment control settings. We also configured the clients to report joining or leaving an experiment which they did by diffing the set of enrolled experiments. As a side effect of this approach, we were blind if the configuration values in an experiment config changed.

Supporting Server Experiments

The system described so far worked pretty well for anything involved in controlling client behavior. However, as we migrated to our golang-based backends we encountered a need to run experiments that changed backend behavior (for example, experiments with feed tuning changes). We considered three approaches:

Server side. On each query (for a feed, or to create a post, etc.) hit the experiment system and fetch a server configuration set. Like the client strategy, we could cache the results for some period of time but this would result in a cumbersome system. One challenge here, with our existing strategy, was that this approach would bypass the client entirely and cut out our existing scheme to correlate metrics with experiments.
Client side, individually. Establish a set of server-configurable values passed through the client. The downside of this approach is that the client doesn’t need to know what these values mean, and passing them through the client means having to push new client versions for something the client doesn’t need to comprehend.
Client side, combined. Establish an opaque pass-thru token through the client carrying a set of configuration data to the server. The client wouldn’t be able to parse these for validity however.

In evaluating these options, while we felt that, in the long term, option 1 (solving the correlation issue) would be the best approach, it would require spinning up a new cache service or making the experiment system more robust and performant (as it would be in the critical path). Therefore, we chose option 3 as it was expedient and provided maximum flexibility. With this method it would also be easy to transition to a server-generated configuration approach easily if that became necessary.

We used a simple opaque token scheme where the client would be configured to pass through the experiment-configured parameter as a URL parameter. This was nice because it meant we could change the nature of the payload if needed.

Here’s what a token would look like (this gets urlencoded):

values:FeedRecencyWeight=0.2,FeedActivityWeight=0.8

We also built another mechanism for referencing a set of values with a meaning, in this case the token passed would look like the following, and the backend decodes these into a set of values.

sets:BalancedFeedRank,DynamicRadius

With this approach, if there was a change in the backend systems (say changing a default value), we could make the appropriate modification to the set parameters.

Conclusion

Overall, while it is a bit awkward to pass server-side experiment parameters through the client, it has worked reasonably well. There are a few shortcomings to be aware of in this approach:

This model doesn’t work for all types of clients. For example, any client that doesn’t cache configuration, operations not triggered by a client request or bulk operations that need configuration for many users (such as batch processes) would require additional mechanics. This could be supported through a backend request for the same data (requiring robust availability and high performance) but we never used this pattern.
The separation of the “experiment name” (logged and recorded to Amplitude) and the configuration data provides flexibility but also room for inadvertent error (accidentally changing the nature of the experiment). It might be a nice addition to compute a short hash of the configuration with the experiment name to avoid this e.g. DirectMessagingEnabled might be logged as DirectMessagingEnabled-a76fh2 such that the hash changes if an underlying configuration or targeting change.