On Developing Expo

Building Our HiPPO Hunting Tool

Photo Credit: 3dman_eu

Did you know one of the most dangerous animals in the world is a hippo? In industry, and particularly around product decisions, the HiPPO, or in this case the Highest Paid Person’s Opinion can be dangerous as well. One of the goals of building our A/B testing platform, Expo, has been to hunt HiPPOs and enable data driven product decisions . In The Journey of A/B Testing at Walmart Labs, we discussed the A/B testing process at Walmart and our platform. In this series, we will dive more into what we built and why, how it was built, and how it is used. Whether you have an in-house platform or rely on a third party to fulfill your A/B testing needs, we hope that you’ll appreciate a look into how our platform works and inspire you to start or augment your own A/B Testing program where you are.

The Motivation

Understanding the value of experimentation, we set out to make it simpler to run tests and analyze results. We originally formed a small team to augment the third party tool we already had. As we explored how it worked, we recognized some shortcomings, explored other third party tools, and decided to invest in building our own A/B testing platform, which we call Expo. A major advantage of building our own platform has been a tight integration with other internal tooling as well as the platform that the applications we’re experimenting with are built on. Many of these tools and platforms were being developed at the same time as Expo and so we were able to take advantage of and guide some of the integrations to work better with our nascent platform. We integrated Expo closely with two of these tools: (1) our configuration management system, which provides feature flags; and (2) our content management system, which is used to define content for most of our site and apps.

By building our own platform, we’ve also been able to create customizations specific to our specific use cases here at Walmart Labs. This has allowed us to focus on features that are directly related to the needs of our testing program, including assignment based on anonymous or logged in users, experiment versioning, and more customized reporting. This also enabled integrating into tools we use for workflow, such as Jira and Slack and providing API’s that others can integrate into to provide usage and performance metrics around the testing program.

Growing Pains

When it was time to deploy our new Expo tool in production, we had the advantage of being able to run it in parallel with the old tool which gave us a nice comparison mechanism to verify it was working as expected. This reduced the risk of just switching over. It also gave us something to compare to using live traffic. At the same time that we were switching over all traffic to Expo, the experimentation program at Walmart Labs was also at an inflection point. The engineering and business teams together were making a stronger commitment to utilize A/B testing. A side of effect of this was that our new platform was getting extra scrutiny, which was good, because it meant Expo would be tested well.

We did encounter a bit of healthy skepticism when first ramping up. We needed to gain trust in the data that we were generating — both among ourselves, and more importantly among our business users. We built easy to view real time graphic monitors to help show our sometimes skeptical business users that the A/B test data was correct and consistent. We did this in partnership with our product analytics team and ensured that the data Expo reported used the same sources and algorithms that they were using to report to our business users. In turn, this helped gain their trust in our new platform.

When we first started operating, we also discovered some drawbacks to our original assignment algorithm. While it was random, it was not deterministic. This meant that visitors were not easily guaranteed a consistent assignment between visits. More notably, because we depended on more or less permanent cookies to maintain assignment state, the engine could not “ramp down” correctly — users assigned into a variation maintained that variation for the lifetime of the experiment, even if experiment traffic was reconfigured down. Over time, this ended up skewing the actual traffic assigned to an experiment compared to what was configured.

We were able to reimplement the assignment algorithm and do it without impacting downstream integrations and dependencies. We will discuss the assignment algorithm in more detail in a future article.

Ramping It Up

Building an A/B testing platform from scratch seemed like such a straightforward undertaking to our naive minds. In some ways it was, but we discovered quite a bit along the way, and are still learning more. Of course, in the end, there’s no point in building an A/B testing platform if nobody wants to run A/B tests. The increased focus on the testing program helped to accelerate the development and adoption of Expo. As our main consumer, the product group has helped to drive the Expo platform development roadmap and determine what features we should build into the platform to best support their needs. In that time, a span of a couple years, we’ve gone from 70 experiments a year to almost 400 and trending higher. Expo now supports A/B testing on both the browser website and native apps, as well as transaction e-mail, and we continue to on board more Walmart businesses and channels. In the process, we’re helping to promote a culture of experimentation to help our business users make more informed product decisions, as well as provide the data and insights into customer behavior that help guide future product research.

We will discuss these features in more detail in subsequent articles in this series, as well as specific experimentation use cases and outcomes that provided interesting learnings. We hope that the ensuing articles will provide more insight into our platform and program and help you on your own A/B testing journey. Let’s go HiPPO hunting.