Dynamic A/B Testing with Fastly

Alberto González
Softonic Engineering
10 min readMay 21, 2024

A flexible approach

What is A/B testing

A/B testing is an essential method for optimizing websites and applications. By dividing your audience into groups and presenting them with different versions of your content, you can collect important data to identify which version is most effective.

What is Fastly

Fasly is an HTTP accelerator, a custom Varnish implementation with TLS/SSL termination.

When Fastly sits in front of your web servers, it caches those resources that are eligible (see Fastly’s caching best practices: https://docs.fastly.com/en/guides/caching-best-practices). This means multiple users might receive the same resource generated by just one request to your origin server.

Fastly/VCL limitations

A/B testing with Fastly often involves static configurations and frequent VCL (Varnish Configuration Language) updates. In this article, we’ll explore a more dynamic approach that empowers developers to create and manage A/B tests without constantly modifying VCL code.

Dynamic A/B testing

The motivation for titling this article “Dynamic A/B Testing with Fastly” comes from the need to avoid the static approach described in Fastly’s documentation. Instead of frequently updating VCL code and releasing new service versions for every A/B test configuration change, this dynamic method enables web application developers to create and manage variant assignments directly through a flexible configuration.

A Dynamic Solution: Separating Concerns

Our approach involves separating the responsibilities of your web server and Fastly:

  • Web Server Role: The web server delivers different content variations based on a dynamic configuration. It uses a query parameter to identify the requested variant and informs Fastly about the available variations and their distribution.
  • Fastly’s Role: Fastly, as the first point of contact for user requests, assigns each request to a variant based on the dynamic configuration and user attributes. It then caches the responses for subsequent requests.

This separation allows developers to manage A/B test configurations independently from Fastly’s VCL code, streamlining the testing process.

Important Note: Since Fastly caches responses, not every user request reaches the web server. This means the web server has limited knowledge about individual users (only what’s in the Vary header, which should ideally be kept minimal).

Cloaking and content abuse

To prevent cloaking (showing different content to search engines than to users) and adhere to Google’s A/B testing best practices, a temporary (302) redirect is issued whenever a user is assigned a test variant. This redirect appends a query string parameter to the URL, distinguishing the variant from the base version.

As mentioned earlier, this query string parameter serves as a signal for the web server, indicating which variant to generate in response to the request. Developers use this information to customize the content accordingly, ensuring a consistent experience for both users and search engines.

The solution

To understand how this works in practice, let’s look at how a set of variants for a resource is defined.

Each variant is defined by a combination of these properties:

  • ID: A unique identifier for the variant, used as a query parameter value to inform the web server which variant to generate (e.g., “TEST-1”).
  • AUDIENCE-RANGE: A numeric range (0–99) defining the percentage of users who will see this variant (e.g., “0–9” means the first 10% of users).
  • AUDIENCE-COUNTRIES: A two-letter ISO country code to target users from specific countries (e.g., “US”).
  • AUDIENCE-PLATFORMS: The user’s device platform (e.g., “windows”).

You can use many other properties to target your audience, as outlined in Fastly’s documentation: https://www.fastly.com/documentation/reference/vcl/variables/client-request/

The first step is to pass this variant information from the web server to Fastly. This is achieved by adding custom headers to the resource response. This approach ensures that resources without variants don’t incur any performance penalties, as the normal flow continues if no variant information is present in the headers.

When variants are available for a resource, the web server includes headers like these in the response:

x-variant-1: id=TEST-1;audience-range=0-9;audience-countries=US|CA;audience-platforms=windows|mac
x-variant-2: id=TEST-2;audience-range=10-19;audience-countries=US
x-variant-3: id=TEST-3;audience-range=20-29

This configuration translates to the following:

  • TEST-1: Shown to the first 10% of users from the US or Canada using Windows or MacOS devices.
  • TEST-2: Shown to the next 10% of users from the US.
  • TEST-3: Shown to the next 10% of users.
  • Base Variant: Shown to the remaining 70% of users or those who don’t match the criteria for TEST-1 or TEST-2.

With this understanding of variant information, let’s examine the VCL implementation that brings this dynamic A/B testing approach to life.

sub recv_collect_audience_properties {
set req.http.x-country-code = client.geo.country_code;
# limit the number of possible values that come from device atlas
if (client.os.name ~ "^Windows") {
set req.http.x-platform-name = "windows";
} else if (client.os.name ~ "^(Mac OS|OS X)$") {
set req.http.x-platform-name = "mac";
} else if (client.os.name == "Android") {
set req.http.x-platform-name = "android";
} else if (client.os.name == "iOS") {
set req.http.x-platform-name = "ios";
} else {
set req.http.x-platform-name = "other";
}
}

sub check_variant {
# get the bucket number as int
if (req.http.x-variant ~ "^id=([^;]+);audience-range=(\d+)-(\d+);.*$") {
declare local var.idVariant STRING;
declare local var.rangeStart INTEGER;
declare local var.rangeEnd INTEGER;
declare local var.isSuitable BOOL;
# Extract the range
set var.idVariant = re.group.1;
set var.rangeStart = std.atoi(re.group.2);
set var.rangeEnd = std.atoi(re.group.3);
# use a temp header to check suitability
set var.isSuitable = true;
if (req.http.x-variant ~ ".*;audience-platforms=([^;]+);.*") {
declare local var.platforms STRING;
set var.platforms = re.group.1;
# we can't create dynamic regex, so we wrap the list of values with the same separator to find out
if (!std.strstr("|" + var.platforms + "|", "|" + req.http.x-platform-name + "|")) {
set var.isSuitable = false;
}
}
if (req.http.x-variant ~ ".*;audience-countries=([^;]+);.*") {
declare local var.countries STRING;
set var.countries = re.group.1;
if (!std.strstr("|" + var.countries + "|", "|" + req.http.x-country-code + "|")) {
set var.isSuitable = false;
}
}
if (var.isSuitable) {
declare local var.bucket INTEGER;
set var.bucket = std.atoi(req.http.x-user-bucket);
if (var.bucket >= var.rangeStart && var.bucket <= var.rangeEnd) {
# set the suitable variant as the 1st suitable variant
if (!req.http.x-suitable-variant) {
set req.http.x-suitable-variant = var.idVariant;
}
if (req.http.x-session-variant-intent == var.idVariant) {
set req.http.x-session-variant = var.idVariant;
}
}
}
}
}

sub deliver_select_active_variant {
# if at the edge, the 1st execution and not a bot
if (fastly.ff.visits_this_service == 0 && req.restarts == 0 && !client.class.bot) {
# get or assign the user a bucket
if (req.http.Cookie:user_bucket ~ "^(\d+)$") {
set req.http.x-bucket = re.group.1;
} else {
set req.http.x-bucket = randomint(0, 99);
}
# session variant intent
if (req.http.Cookie:user_variant ~ "^([^-]+-\d+\.\d+)$") {
set req.http.x-session-variant-intent = re.group.1;
}
# find a suitable variant within the 5 possible ones from backend
set req.http.x-variant = resp.http.x-variant-1;
call check_variant;
set req.http.x-variant = resp.http.x-variant-2;
call check_variant;
set req.http.x-variant = resp.http.x-variant-3;
call check_variant;
set req.http.x-variant = resp.http.x-variant-4;
call check_variant;
set req.http.x-variant = resp.http.x-variant-5;
call check_variant;
# repeat above for any number of simulataneous variants you want to support in a single resource
# Check if either a session or a suitable variant has been found
if (req.http.x-session-variant) {
set req.http.x-active-variant = req.http.x-session-variant;
} else if (req.http.x-suitable-variant) {
set req.http.x-active-variant = req.http.x-suitable-variant;
}
# clean up temp headers
unset req.http.x-variant;
unset req.http.x-session-variant-intent;
unset req.http.x-session-variant;
unset req.http.x-suitable-variant;
# decide if we need to restart the request
# The != and !~ operators always evaluate to true when either operand is not set
# A not set value is converted to an empty string when assigned to a STRING variable and the empty string always compares true in conditions.
declare local var.exValue STRING;
declare local var.activeVariantValue STRING;
set var.abValue = querystring.get(req.url, "ab");
set var.activeVariantValue = req.http.x-active-variant;
set var.abValue = if(std.strlen(var.abValue) > 0, var.abValue, "none");
set var.activeVariantValue = if(std.strlen(var.activeVariantValue) > 0, var.activeVariantValue, "none");
if (var.abValue != var.activeVariantValue) {
set req.http.x-variant-redirect = "yes";
restart;
}
}
# persist the bucket and the variant in the user state
if (req.http.x-bucket) {
add resp.http.Set-Cookie = "user_bucket=" req.http.x-experiment-bucket "; domain=" req.http.host "; max-age=1296000; path=/; secure";
}
if (req.http.x-active-variant) {
add resp.http.Set-Cookie = "user_variant=" req.http.x-active-variant "; domain=" req.http.host "; max-age=1800; path=/; secure";
}
}

sub recv_redirect_to_active_variant {
# we only act on the edge, never on the shield
if (fastly.ff.visits_this_service == 0) {
if (req.restarts == 0) {
# do not allow these control headers from the client request
unset req.http.x-active-variant;
unset req.http.x-experiment-redirect;
}
if (req.http.x-experiment-redirect == "yes") {
if (req.http.x-active-variant) {
set req.url = querystring.set(req.url, "ab", req.http.x-active-variant);
} else {
set req.url = querystring.filter(req.url, "ab");
}
error 655 "Redirect to variant";
}
}
}

sub error_redirect_to_variant {
if (obj.status == 655) {
set obj.http.Location = "https://" req.http.host req.url;
set obj.status = 302;
set obj.response = "Found";
return (deliver);
}
}

sub vcl_recv {
#FASTLY recv
call recv_collect_audience_properties;
call recv_redirect_to_active_variant;
}

sub vcl_hash {
set req.hash += req.http.host;
set req.hash += req.url.path;
set req.hash += querystring.get(req.url, "ab");
#FASTLY hash
return(hash);
}

sub vcl_error {
#FASTLY error
call error_redirect_to_variant;
return(deliver);
}

sub vcl_deliver {
#FASTLY deliver
call deliver_select_active_variant;
return(deliver);
}

Key points of the VCL implementation

Despite the intricacies of VCL, the process itself is quite straightforward. Upon receiving a user request and obtaining the corresponding response (either from cache or the origin server), the VCL code examines the variant headers, if any, to determine if a variant should be applied to that user request. Due to VCL’s lack of loop constructs, this check is currently limited to a maximum of five headers, meaning a single resource can support up to five simultaneous variants (though this can be easily increased by repeating the relevant code lines for additional headers, but be mindful about header count limits in Fastly).

To distribute users proportionally across variants, a random “bucket” number (0 to 99) is assigned to each user during their first visit. This bucket number is then used in subsequent visits to identify the user’s audience segment. Additional factors like the user’s country and device platform are also taken into account.

By combining this user information with the data in the variant headers, the VCL code can decide whether or not to assign a variant to the current request. If a variant is assigned, it’s stored in a cookie, ensuring consistency across different resources and user sessions. This enables the creation of personalized user journeys throughout the website.

When a new variant is assigned to a request, a redirect is issued to the same resource, but with the ab query parameter added (and its value set to the variant ID). This redirect occurs only if the ab value differs from the current one, preventing infinite loops. The query parameter becomes part of Fastly's cache key, ensuring that the request for the specific variant is forwarded to the web server.

While the core logic is relatively simple, there are nuances to consider when working with VCL and Fastly’s implementation. The inline comments in the code are provided to clarify them.

Managing cache and preventing infinite redirects

While the dynamic A/B testing approach simplifies the process, there are important considerations for managing cached content. A key principle of A/B testing is that variants are temporary (until a winer variation is choosen). Therefore, it’s crucial to proactively manage the cache when enabling, disabling, or modifying A/B test configurations to avoid unexpected behaviour.

For instance, when introducing a new A/B test configuration for a resource, the cache for that resource must be cleared. This allows Fastly to fetch the updated headers containing the new configuration. Similarly, when an A/B test concludes, clearing the cache is essential to remove any cached responses associated with the old configuration. If multiple variants of the same resource have conflicting A/B test configurations cached in Fastly, it could lead to infinite redirect loops.

To address this, leveraging surrogate keys (as described in Fastly’s documentation: https://www.fastly.com/documentation/guides/concepts/edge-state/cache/purging/) is highly recommended. Surrogate keys offer granular control over cache invalidation, allowing you to target specific resources for purging. Ideally, this process should be automated to coincide with A/B test configuration changes, ensuring a seamless and error-free testing experience.

Conclusion

This article has explored a dynamic approach to A/B testing using Fastly, addressing the limitations of traditional static configurations. By separating the responsibilities of the web server and Fastly, we’ve empowered developers to manage A/B test configurations independently, leading to increased flexibility and scalability.

The detailed implementation, including the VCL code, demonstrates how variant information is communicated via custom headers, user assignments are made, and redirects are handled to prevent cloaking and ensure a seamless user experience. Additionally, we’ve highlighted crucial considerations like surrogate key utilization and cache management to ensure optimal performance and avoid issues like infinite redirects.

This dynamic A/B testing strategy with Fastly offers a robust and adaptable solution for optimizing your web content and making data-driven decisions to enhance user engagement and conversions.

--

--