The Fine Art of Visualizing Experiment Results

Published in

disney-streaming

6 min readApr 25, 2022

Introduction

At Disney Streaming, we have been rapidly expanding the use of experimentation to aid decision making across the organization. In a previous post, we outlined how our experimentation platform helps users effectively set up experiments using a power analysis tool. In this post, we’ll look at the other end of the experimentation loop by examining how we help teams understand the results of their tests through data visualization. We’ll cover specific visualizations we use to represent statistical uncertainty and the technology that powers them.

We aim to address three distinct use cases for visualization:

creating visualizations in ad-hoc analyses
using dashboards to get feedback from users
implementing visualizations at scale within our experimentation platform.

To build the tools that power our visualizations, we prioritized software that gave us a high degree of customizability. Our final solution is a combination of a standalone python library for ad-hoc data visualization, a Dash app for rapid prototyping, and automated in-tool visualization.

Why create a library for ad-hoc visualization?

The process of creating ad-hoc visualizations has a few key challenges.

It can lack consistency. If you give any two people the same data, they may choose completely different ways to visualize the results. It was important for us to maintain consistency so that any time a stakeholder views experiment results, it’s exactly the same chart type, color, etc..
It can be tedious to repeat across analyses. Even copying and pasting code can be cumbersome. We wanted experimenters to spend less time worrying about tweaking code and more time showcasing their results.

To help remedy these challenges, we developed our own internal plotting library in python, explot. We used explot to standardize and automate how experiment results are visualized when analyzed outside of our experimentation platform. Explot serves as a companion to our experimentation analysis library, expan (learn more about expan in our previous post), and was built to take results from expan and create visualizations with a single command.

Under the hood, explot is powered by the Plotly visualization library. We chose Plotly because of its straightforward API and its diverse set of customization options. Within explot, we created a standardized set of plots and tables that experimenters can use to generate results visualizations (see figures below for an example). In addition, we worked closely with the UX design lead for our experimentation platform to ensure that all the visualization details for our plots (color, font, marker size, etc.) were consistent across explot and our experimentation platform. These options are stored within Plotly templates that are automatically loaded in the library. Experimenters can easily install explot into their python compute environments (at Disney Streaming we use Databricks notebook environments) using a pip command and an artifactory address. Using explot, we’ve streamlined the process of experiment visualization and have helped ensure visual consistency across experimentation product areas.

Figure 1: Ad-hoc metric results table. All data is simulated.

Figure 2: Ad-hoc metric relative change plot. All data is simulated.

The above figures showcase a few of the standardized plotting objects we’ve created within explot. Figure 1 displays the numerical value of results for the experiment in a table. Each row of the table displays a metric name. The table columns are arranged hierarchically, with categories on the first level and variants on the second level. The metric denominator reflects the number of participants for each variant that were passed into a statistical test. The metric value reflects the group mean for each variant. The relative change reflects the relative difference between test variant versus control variant.

Figure 2 displays the relative change for all metrics for the pairwise comparison between the treatment variant and the control variant. This example contains a metric that is not statistically significant (colored gray), a metric with positive relative change that is statistically significant (colored green), and a metric with negative relative change that is statistically significant (colored red). Each metric contains metadata information indicating whether increased magnitude of the metric is a favored or not favored outcome (ex. higher values of churn are not favored). The plotting library and our platform uses that metadata to determine how to color results. One of the benefits of relative change plots is that they allow for multiple metrics to be plotted side by side, even if they have different units.

How do we test out new visualizations and get feedback?

With explot, we developed a flexible plotting library that let us create standardized data visualizations for experiment results. As part of our development process, we wanted to find a way to quickly test new plots on a wide range of experiments and to get feedback from our stakeholders. With the the help of our engineering services team, we set up a simple dashboard in Dash that reads in live experiment results from our data warehouse and visualizes them using explot. Using Dash, we could easily preview visualizations by serving our app locally. Once we were satisfied with the look and feel of our plots, we pushed the changes to the live app to show our stakeholders and get feedback. Together, explot and the accompanying dash app enabled us to rapidly prototype visualizations that we could pass off to our engineering team for implementation. Implementing our visualizations from the dash into our experimentation platform though, brought up new challenges that we had to overcome.

How are visualizations implemented in the experimentation platform?

When we proceeded to implement visualizations within our experimentation platform, we had to balance two important objectives.

The visualizations had to be easily customizable.
The visualizations had to be fast and not impair the performance of our platform.

We tested a number of different plotting libraries. While we appreciated the ease with which many of these libraries were able to get visualizations up and running, our needs on the platform required more customization than we felt these libraries offered. Ultimately, we were happy to find a library that straddled the best of both worlds, accessibility out of the box and deep customization, in VisX. VisX is an open source library from AirBnB that blends the power of D3 with the joy of React. What drew us to VisX was that it’s a library built around visualization primitives, which allowed us to extend and optimize visualization UX for our unique use cases. With VisX in combination with UX designs and prototypes from our data science team, we have built out a set of high level in-house React visualizations that consistently present data and allow us a full range of customization options.

Figure 3: In-tool results table. All data is simulated.

Figure 4: In-tool relative change plot. All data is simulated

The above figures showcase how visualizations appear within our in-house experimentation platform. Experimenters can toggle between chart and table views and each visualization has its own interactive features. For tables, each row has a small color indicator for whether a metric is statistically significant and in which direction. For charts, each metric can be hovered on the confidence interval range as well as a description of the metric.

Summary

Data visualization plays an important role in experiment evaluation. The quality of our data visualizations allow our stakeholders to easily identify issues, see new trends and learn new insights. Without this we risk frustrating experimenters or worse, losing their confidence with the data presented. In this post, we highlighted how we developed solutions for automated visualization at the ad-hoc and platform levels. We continue to add new visualizations, in partnership with our stakeholders, that provide a digestible way for them to gain deeper insights into experiment results.

Thanks for reading and feel free to reach out if you have any questions or comments! We are also hiring for many roles in our organization, so please feel free to browse current openings.

Acknowledgements

Developing the experimentation platform is a large cross-functional effort — thanks to everyone who contributed to this feature and the blog post: Mark Harrison, Diana Jerman, Robin Cox, Phil Hauser, Michael Ramm, Kilian Scheltat, Doug Fertig, and many other members of the Experimentation-X team.