The Journey of A/B Testing at WalmartLabs

Image Credit: holdentrils

If someone told you that a change in your e-commerce website would result in a lift of potentially millions of dollars a year in revenue, would you care? If that change resulted in a 0.5% drop in revenue, would you want to know? Would you want to know before that impact was made across all the visitors to your site? At Walmart, through A/B testing, we are able to answer these questions with low risk by testing new features and changes against a subset of the full site traffic.

The journey we have taken with the A/B testing program at Walmart the past few years has had its twists and turns. It ultimately involved a cultural shift to drive testing, the development of Expo, our own internal A/B testing platform, and developing a strong process for experimentation.

A Cultural Shift From the Top Down

While A/B testing has always been done at Walmart, it wasn’t really a part of the core of product development. This cultural shift to make experiments part of the development lifecycle was a major key to the success of the A/B testing program, and one of the major keys to building a culture of experimentation at WalmartLabs has been executive support.

As part of this shift, the A/B testing process and the product development lifecycle has been integrated. Test plans are created and reviewed as part of the process. Product managers and analysts work together to define the traffic required, metrics measured, and criteria for success. Developers work with QA to ensure the treatments are implemented and working as expected for all variations of an experiment. To keep the process organized, a test manager coordinates these activities, and configures and schedules the test.

In addition, product development now ensures that all new features are launched via the A/B testing process. Developers make sure their code can be run as part of an experiment and the application platform is integrated with the A/B testing platform. There is a team dedicated to managing experiments and ensuring they are defined, executed, and analyzed successfully. In essence, A/B testing is a first-class citizen.

A Platform to Call Our Own

Another key to the success of our A/B testing programing is our in house platform, Expo. Why did we end up building an in house platform? It is actually not too uncommon. There are numerous companies that have their own in house experimentation platforms, such as Netflix, Google, and LinkedIn. Some have even open sourced theirs, such as Facebook and Intuit. WalmartLabs had been using a third party system, but it didn’t integrate well with the application platform and as the new testing process was developed, it became less of a fit. By building Expo, we were able to customize it for our needs, develop tight integration with platform components to facilitate test development and setup, and provide more timely and customized support.

Expo Experiment Management UI

Expo is a server-side A/B testing platform and is deeply integrated with our application framework. It supports both web-app tests, as well as native mobile apps. The platform includes a UI to allow test managers to setup and monitor tests as well as view results.

Expo’s integration within our application framework allows application developers to implement features without giving special consideration for Expo. For example, treatments can be implemented as a feature switch through our configuration management system, as the override mechanism for alternate values based on experiment variation is within the configuration management system itself. Another treatment type is exposed through our content management system, which actually requires no developer interaction, allowing for site curators to configure different content to associate with experiments. These integrations provide for several advantages.

  1. Development is not complicated with custom A/B test platform code for every test.
  2. There is potentially no code cleanup after a test.
  3. The winning treatment can be activated independent of a code change/deployment
  4. Activating the winning treatment for all traffic can be done without requiring the experiment to run continuously.

While some level of investment was required to build an in house platform, it has proven to be an advantage for us. It has allowed for building out features that work for our A/B testing process and helped enable experimentation to be built into the development lifecycle.

It’s All About the Data

When someone runs an A/B test, they expect results, and whether they are positive or negative, they need to be correct. One challenge we faced in building our own platform was building trust in the results. With the support of a strong product analytics team, and a revamped data pipeline, we were able to achieve that.

Experiment Results in Expo UI

Of note, Expo doesn’t provide results reporting on its own. Our platform relies on results data processed through the system owned by the product analytics team. Expo does provide source data to that pipeline that gives a mapping of sessions and visitors to which experiment variations they were assigned to and qualified for. It also displays the results through the UI to provide a unified experience for users from test setup to results analysis. This separation of concerns also alleviated the Expo development team from running and maintaining data pipelines outside of the scope of A/B testing.

Lessons Learned

The journey we have taken so far has not always been a smooth road, but we have been able to learn quite a bit from the bumps along the way.

The first lesson learned was that executive support was key, and we needed to make A/B testing a central part of the product development lifecycle. While some changes can be made bottom up, to instill a culture of experimentation required a top down approach.

The second lesson was to have a well documented process and stick to it. We very rarely allow exceptions to the process, which forces everyone to do things in a consistent, well defined manner. A major aspect of this process is communication of the test plan, expectations, and results.

The third lesson was to double down on reporting, because without results, there is no purpose for testing. As part of this, we spent extra effort to validate the results being produced to ensure the credibility of the platform and data. For the process to work and the program to be successful, the testing and the results needs to be reliable and trustworthy, otherwise the program is undermined.

The final lesson is in developing the platform to have clear requirements so that we focus on features specific to our needs, which is one reason why we built an in house platform to begin with. Also, a problem or requirement may be resolved via process versus a change in the platform, and it’s important to recognize that as well to streamline what we spend time developing.

The A/B testing program at Walmart Labs continues to evolve as we continue to refine the process and platform and champion A/B testing across more of the organization.




We’re powering the next great retail disruption. Learn more about us —

Recommended from Medium

Abstraction is key to rapidly scaling Digital Services in an Enterprise

Introducing the BitBoost Open Source Project | BitBoost the Blog

Setting up syslog-ng — A step towards Centralized log management

Exceptions, Result Types and Kotlin

Understanding Python’s List Comprehension

Greedy Algorithms Quiz 3

10 Tips for Android Library Developers

Taking Vale Server to the web

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Anthony Tang

Anthony Tang

Leading the experimentation platform team at WalmartLabs building Expo, the A|B testing platform for Walmart.

More from Medium

The 10 Limitations of Firebase A/B Testing (ft. Hackle A/B Testing)

[Learning Note] A/B testing in Meta search engine(1)

Assign Experiment Variants at Scale in A/B tests

More than Red Button vs Blue Button: How we use Statsig