Network Effects in Experimental design

Published in

Analytics Vidhya

4 min readDec 16, 2020

Experiments are designed to measure the effectiveness of the proposed changes. The most basic assumption in any experimental design is SUTVA (Stable Unit Treatment Value assumption). The control and the treatment units must be independent of each other and are selected in a completely randomized way. Every unit in treatment is only affected by the treatment and not by other factors.

However, on a social platform, every user is connected to each other. If they are exposed to an experiment, the users are not only influenced by the treatment but also by the other users in the experiment. This is Network effects and it violates the SUTVA assumption.

Source: Anna Lindh Foundation (A network of networks)

Examples:

Consider an experiment to measure the conversion rates from a website using the “Buy Now” or the “Shop Now” CTA button. This can be tested with A/B testing. The treatment and control can be assigned randomly as they do not influence each other a lot. The effect of the treatment can be measured using typical testing methods.

But consider a test where you are about to launch a “Facebook Pay” feature in the Market place. Here, the treatment and control units cannot be randomly assigned. The feature by itself is a two-way application with a sender and receiver. Choosing treatment units randomly might cause creating groups that have senders without any receivers and vice versa (like the example below).

Network effects are also present in several other social applications like Skype, Google Docs, Airbnb, Uber, Facebook, etc. with their newsfeed, content ranking models, ad auction, any other product in a social setup. In all these cases, every user is related to the other. The outcome of the experiment is dependent on this connectivity, violating SUTVA. Overseeing network effects introduce bias and increase variance impacting the results of the experiment.

To detect network effects, this paper explains the following method — run the same experiment between a clustered randomized and a unit randomized group of customers. If there is a significant difference between the estimates between both groups, this will prove the presence of network effects within the samples.

Impact of Network Effects:

One of the main impacts of the network effects is the Spill-over effect. If an offer is tested for treatment, there could be customers who communicate between the treatment and the control groups. This will cause customers in the control group to take advantage of the offer and thereby diminishing the true lift from the experiment.

Possible Solutions:

The best way to deal with network effects is to group customers who are similar and measure the aggregated difference between the treatment and the control. This way, there is less spillover between treatment and control groups. There are two methods to cluster — K means clustering and spectral clustering.

K means clustering groups the customers into k similar groups based on centroids. Being an unsupervised algorithm, the points are allocated into a cluster by reducing the sum of squares of the distances between the point and the centroid.

Spectral Clustering — Graph cluster randomization is based on the spectral theorem. The Laplacian matrix is the main tool to create graph cuts based on the eigenvalues to divide the population into treatment and control groups.

Things to keep in mind:

· In a network setting, the behavior of the user is not only influenced by the treatment but also by the behavior of the other users. This results in two models — treatment interference vs. behavior interference — in the former, the goal is to minimize the variance between the clusters, and in the latter, the goal is to minimize both bias and variance trade-off. Based on the application, the right method has to be chosen.

· Watch out for users who are not or least networked to others. Making several graph cuts can isolate users who are least connected into separate clusters

· Time zone and server latency of the treatment and control groups can also impact results

· Ensure everyone in the treatment group receives the treatment

· Network exposure = % of you and your friends exposed to treatment

Conclusion

The network effect is a wide-spread problem existing in several industries today. I have consolidated resources that I came across to get a basic idea of the problem and posted some possible solutions. Graph Cluster Randomization is by far the effective one. Below references explain this concept in detail.

References:

· Experiments with Network Effects — Link

· IMS-Microsoft Research Workshop: Foundations of Data Science — Graph Cluster Randomization

· Designing A/B tests in a collaboration network

· Detecting Network Effects — Link

Graph Cluster Algorithms:

· Graph Cluster Algorithms — Link

· Finding Clusters in Graphs — Link

Network Effects in Experimental design

Written by Sadhave S R