Book Summary: “Trustworthy Online Controlled Experiments” [Part II.]

Weonhyeok Chung
4 min readOct 13, 2022

--

(Preview) In this part, I learned what are the types of metrics in online AB testing, how to incorporate multiple metrics, and why it is useful for the firm. Also, I had a chance to understand why speed matters in experiments and the ethical issues that can arise.

This series is my summary of the book on AB testing, “Trustworthy Online Controlled Experiments” (by Ron Kohavi, Diane Tang, and Ya Xu)

Link to other parts of the series:

Part I. Introductory Topics for Everyone

Part III. Complementary and Alternative Techniques to Controlled Experiments

Part IV. Advanced Topics for Building an Experimentation Platform

Part V. Advanced Topics for Analyzing Experiments

Part II. Selected Topics for Everyone

In this part, I learned what are the types of metrics in online AB testing, how to incorporate multiple metrics, and why it is useful for the firm. Also, I had a chance to understand why speed matters in experiments and the ethical issues that can arise.

Photo by Artur Aldyrkhanov on Unsplash

Ch 05. Speed Matters: An End-to-End Case Study

Summary: The speed that users experience affects metrics such as revenue per user. In Amazon’s experiment, sales decreased by 1% per 100 milli-seconds slowdowns. Bing also finds similar results from an experiment — every 100 msec speedup increases revenue by 0.6%. When evaluating the performance of a website, one needs to evaluate it differently by a different sequence of chunks of information from the server. The chunk that users receive for the first time affects user response the most followed by the ones that come after. In addition, depending on the source of pages, the user’s experience differs. In the case of Twitter, the first Twitt user experience can affect the user experience critically.

New or curious concept (or questions): Is there any evidence that the metric such as revenue and page load time is curved as figure 5.1 in the boo (as page load time increases, the revenue decreases, but the intensity of change decreases)?

Ch 06. Organizational Metrics

Summary: The most common metrics used are goal metric, driver metric, and guardrail metric. The goal metric is related to the mission of the company. The driver metric is an indirect metric necessary to achieve the goal metric. Users’ happiness with the product is an example. In addition, asset metrics, engagement metrics, business metrics, and operational metrics are also important. Examples of two business metrics are revenue-per-user or DAU (daily active user). These metrics should be concise, stable, related to firms’ goals, and achievable. Also, these metrics can be revised after continuous evaluation.

New or curious concept (or questions): Is DAU an important metric for the firm inside?

Ch 07. Metrics for Experimentation and the Overall Evaluation Criterion

Summary: When we evaluate OEC (Overall Evaluation Criterion), we need to incorporate multiple metrics. Rather than looking at certain parts, we need to evaluate them based on the overall business goal. Still, we should not use too many metrics. Also, we need to use both long-term metrics and short-term metrics. We need to be careful that the correlation in the data does not imply causation.

New or curious concept (or questions): I didn’t get to the point of how to incorporate multiple metrics into one.

Ch 08. Institutional Memory and Meta-Analysis

Summary: We need to record previous results from the experiments — which policies were positive and which were not. In addition, we can run additional experiments for the ones with sensitive results. The history of the results of the experiment helps the new employees understand the company better. The records of experiments also increase innovation.

New or curious concept (or questions): What is the ‘beta ramp phase’ in the experiment? Note: LinkedIn built auto-ramp-related features.

Ch 09. Ethics in Controlled Experiments

Summary: The experiment including online experiments also affects users. When running experiments, we need to consider the risk, benefits, and choice of products of users. In addition, the privacy of the user should be carefully handled. It is helpful to check IRB criteria.

New or curious concept (or questions): I was wondering how online tech firms take care of these ethical issues.

Three takeaways (from Part II.):

(1) The sequence of information affects users differently by the time of arrival. And the first content matters the most.

(2) We need to evaluate metrics overall, and consider not only the short-term metrics but also the long-term metrics.

(3) Firm needs to record the results of experiments to transfer the knowledge to the successor.

Three takeaways:

(1) The sequence of information affects users differently by the time of arrival. And the first content matters the most.

(2) We need to evaluate metrics overall, and consider not only the short-term metrics but also the long-term metrics.

(3) Firm needs to record the results of experiments to transfer the knowledge to the successor.

Please feel free to leave any comments or questions! Thank you for reading my post.

--

--