Performance optimised TUI Musement’s A/B testing solution on top of AWS infrastructure.

Przemyslaw Falowski
TUI Tech Blog
Published in
11 min readJan 22, 2021

Agenda:

Introduction:

  1. What is A/B testing?
  2. Initial design
  3. Frontend AWS infrastructure

A/B testing with AWS Lambda@edge :

  1. Architecture
  2. Implementation
  3. Performance Impact
  4. Costs

TL;DR:
If you want to implement an A/B testing solution, which supports SSR and does not impact CloudFront cache, using Viewer Request Lambda@edge intercept a request before it arrives to CloudFront and add a cookie with a selected variant to the Request Header. On server based on that cookie render correct HTML, then attach Set-Cookie header to the response with value of selected variant to save it in user browser.

TL;But I can read:
If you already know about A/B testing, CloudFront and Lamba@edge, then go directly to A/B testing with AWS Lambda@edge.

What is A/B testing?

A/B testing is a UX research methodology focused on comparing users’ engagement of two different versions of a website. The main purpose of A/B testing is to select the version with a greater return on KPIs (ex. the number of user interactions per visit or conversion rate).
Instead of releasing a completely new design for all users, with A/B testing you can show the new version of a component to a select group of visitors ex. 20% of all users. In doing so, you can compare a brand new UI with the old version, to determine which of the two variants is most effective and will have the greatest impact on your business.

Example of the component redesign.

The scenario above can be represented by the following JSON file:

{
"experimentName": "ActivityCardRedesign",
"variantPercentage": 20,
"control": "OldActivityCard",
"variant": "NewActivityCard"
}

Initial design

What do we need?

Let’s try to summarize everything and prepare a list of requirements. To implement an A/B testing system, we require the following:

  • A/B tests configuration
    Configuration can be represented by a simple JSON object. To support multiple experiments at the same time let’s group them into an array:
{
"experiments": [
{
"experimentName": "ExperimentA",
"variantPercentage": 20,
"control": "ControllNameForExperimentA",
"variant": "VariantNameForExperimentA"
},
{
"experimentName": "ExperimentB",
"variantPercentage": 20,
"control": "ControllNameForExperimentB",
"variant": "VariantNameForExperimentB"
},
]
}
  • A function which can assign a correct test to a user (Version A or Version B)
    As A/B test configuration will be assigned to every user we can split the traffic based on simple Math.random(). With a big test group, our split will be approximately equal to the value inserted in the config file.
  • A mechanism to show the user the assigned version. Refreshing a page or opening a subpage in a new tab should not change the assigned test.
    To fulfil that requirement, we have 2 choices: Window.localStorage or HTTP cookie.
    Unlike cookies, values saved in the LocalStorage are visible only on the client-side (user browser). This might be a problem when an element under A/B tests needs to support server-side rendering (SSR). Because of this, cookies are considered a better choice.
  • Tracking system to compare A/B testing results.
    This depends on the tracking system used by your company. You can use the custom dimension of Google Analytics as well as an internal tracking system implemented by your BI team.

Client-side or server-side?

In the previous point, we wrote that some applications might need to support server-side rendering. The same situation applies to A/B testing solutions, we can divide them into two groups: client-side and server-side.

During client-side A/B tests, your application calculates the variant which needs to be displayed in the user browser.

Client-side A/B testing

If your application does not support SSR then this might be a desirable choice, however, if your component also needs to be rendered on the server-side, you might need to apply additional workarounds (ex. hide the component with display: none until the browser calculates which variant should be displayed to the user). Unfortunately, all of these “hacks” come with limitations like component flickering or increase of Time to Interactive.

During Server-side A/B tests, your user receives from the server HTML with the already selected variant.

Server-side A/B testing

In such a scenario, the HTML which arrives to the client already includes the variant selected for that user.
User config calculated by the server can be passed and stored in the browser using set-cookie Response Header.
The first idea could be to implement code related to A/B testing directly in the code which runs on your Node.js server, unfortunately, this could be a pretty expensive mistake which may lead to significant performance impact.
To understand why let’s look at the infrastructure of the typical frontend application.

AWS infrastructure

The diagram above represents the simplified infrastructure of frontend applications with server-side rendering. CloudFront which is in the middle of the diagram is a content delivery network (CDN) provided by Amazon. Purpose of the CDN is to provide better user experience by delivering requested content faster, as well as reduce traffic on Node.js server.

To simplify: when 20 users open www.musement.com, CloudFront will pass the first request to Node.js server, save the received response and for the next 19 users it will use that cached value to immediately return the content of musement.com (assuming all requests will hit the same CloudFront instance).

Response served directly from CloudFront can be detected based on Response Header called x-cache:

  • If CloudFront called Node.js Server: x-cache: Miss from cloudfront
  • If CloudFront used cached response: x-cache: Hit from cloudfront

But what exactly is the problem with the solution described in the previous section?
Well, the main problem is splitting users correctly into groups represented by variantPercentage parameter. If the first user who visits our website gets assigned the variant version CloudFront will cache that response and will use the variant for all users whose requests hit that CloudFront instance.
It is, of course, possible to modify CloudFront configuration so that it does not cache requests which do not include a cookie with user’s ab-test configuration (AWS docs). In this case, for every user, correct A/B test configuration can be calculated and assigned by Node.js, but doing so would forward all new users to the application server, even if request generated for variant and control is exactly the same and was already cached by CloudFront. The second problem is the increase in server costs (because of increased traffic) and — even more importantly — a decrease in user performance (the new visitors would not benefit from CDN).

The best solution to these problems would be calculating A/B tests assigned to the user before requests hit CloudFront. Luckily, it is possible to do so using Lambda@edge which lets you run custom code in 4 different points (from AWS docs):

  • After CloudFront receives a request from a viewer (viewer request)
  • Before CloudFront forwards the request to the origin (origin request)
  • After CloudFront receives the response from the origin (origin response)
  • Before CloudFront forwards the response to the viewer (viewer response)

A/B testing with AWS Lambda@edge :

Architecture

To fully support server-side rendering and not to impact cache performance, our solution needs to assign a variant for every user before request hits the CloudFront instance. AWS gives you a possibility to do so by using Lambda@edge. Lambda@edge is a feature of AWS CloudFront which lets you attach Lambda function to your CloudFront instance and run custom code which modifies request or response (dependent on which stage Lambda@edege is attached).

The architecture of our A/B testing solution is represented by the diagram below:

The architecture of A/B testing solution using Lambda@edge

Data flow in this diagram will look like the below:

  1. When the user opens www.musement.com Lambda@edge attached to the Viewer Request is invoked — the purpose of this function is to check cookies saved in the user browser. When Request includes cookie with valid A/B test configuration it is passed to the CloudFront without any modifications. In another case, lambda is calculating which variant will be displayed for that user. The selected variant will be added to the cookie Request Headers so for the CloudFront it will look the same as in the case where a valid cookie is already present in the browser.
  2. The request will arrive in the CloudFront instance. Based on various parameters (ex. Request URL, cookies or other Request Headers selected by you) and cache expiration time, CloudFront will then decide if the user can receive one of the previously cached server responses.
    If a valid response is present in CloudFront cache, CDN will immediately use that value without forwarding it to Node.js Server.
    In another case, CloudFront will forward the request to Node.js Server. When your server returns a rendered HTML it will be cached and used for the following requests.
  3. Assuming that CloudFront did not have a valid request within its cache. The request will arrive in the Origin — in our case — Node.js Server.
    On the server, we can assume that every request includes a valid cookie with a selected variant which needs to be presented (as we said before it’s the job of Viewer Request Lambda@edge to add a correct cookie to all requests). Knowing this, we can read that value and display the correct version of the currently tested element.
  4. Origin Response Lambda@edge — Before a response arrives in the CloudFront instance and it’s cached for the specified period of time, we need to add one more thing. In the first point, I wrote that Viewer Request Lambda@edge checks cookie saved in the user’s browser, however, until that point there is no mechanism which saves the cookie added to the request in the first step. To cover this, we need an additional Lambda@edge function added attached to Origin Response which will save Set-Cookie header to the response generated by the server. Doing so will save the cookie in the user browser so that on the next visit we can provide a consistent user interface. This part can also be implemented directly on Node.js server, in this case, additional lambda is not needed. The important part is to have the correct Set-Cookie Response Header before Response arrives in CloudFront instance.

Implementation

The first item which needs to be implemented is A/B test config. In our case, we decided to keep it inside a simple abtestConfig.json stored on S3 bucket. The key is used to describe a version of the config file.

Another solution would be to use a real database like DynamoDB, however, we noticed that using any external services significantly increase a duration of lambda function (with fetch from S3 we got ~250–400ms of duration comparing to ~2–5ms when configuration was hardcoded inside Lambda@edge function) for that reason we decided to implement an additional internal cache mechanism, so using different service would not impact so much the performance of lambda function.

The code below contains the implementation of fetchConfig function used by Viewer Request Lambda@edge. fetchConfig is getting the configuration from an S3 bucket and caches the result. The cached result will be used for 30 minutes or until userTimestamp passed as a parameter has newer value than currently cached config. This situation might happen during the update of abtestConfig, when one instance of Lambda@edge is already downloaded and assigned an updated config to the user and after a page refresh, user’s request arrived in the instance of Lambda@edge which still wants to use the previously cached value.

⚠️ Lambda@edge can be created only in US East (N. Virginia) AWS region, so keep this in mind when creating one. Of course, it does not mean that Lambda created in this region will only run on servers settled in US East. After attaching it to CloudFront instance, it will be deployed to all edge locations around the globe..

Next step is implementing Viewer Request Lambda@edge. This function, which will be called on every request to your page, should:

  1. Call fetchConfig function.
  2. If the user has a/b test config with a timestamp equal to the one from testConfig.json it should pass the request without any modifications
  3. If the user does not have assigned a/b test configuration it should roll the dice for every experiment present in testConfig.json
  4. If the user has a/b test config with a timestamp older than the one from testConfig.json it should roll the dice for all new experiments, delete the finished experiments (not present in testConfig.json) and keep previously assigned experiments which are still present in the config file.

With the already implemented function, you will be able to attach Lambda@edge to CloudFront instance. However, besides adding a abtests-user-config cookie to request header we need to inform our CloudFront instance that this cookie should be used as a cache parameter. Without that, CloudFront will not forward the cookie to Node.js server, so even if at first everything looks fine on our server, the newly created value will not be present.
To do so you need to modify the CloudFront behaviour used by adding abtests-user-config to “Whitelist Cookies” field of “Forward Cookies” option.

The last step is to create Origin Response Lambda@edge to attach the missing Set-Cookie header. As we wrote before, this function will simply read the value of abtest-user-config the cookie and set the same value under Set-Cookie. Remember that this part does not always require separate Lambda@edge.

⚠️ It is also possible to set a correct header directly in Node.js instead of creating an additional Lambda@edge, ex. in our case we used universal-cookie-nuxt which was already present in our codebase.⚠️

Ready A/B testing solution can be used to render one of the available variants. The example below shows how we handle the rendering of A/B tested components in Vue.js.

Performance Impact

Request Latency — As additional nodes were added to our infrastructure we expected an increase of time required to fetch musement.com from CloudFront equal to the duration of Lambda@edge. Within the main regions , we sell activities (Europe and USA), the average duration of Viewer Request Lambda@edge was ~2–10ms. Which is unnoticeable for the end customer. However, there are some situations when the duration is much longer. The first situation occurs when internal config cache expires (once every 30 minutes), at this point fetching the config increases lambda duration to ~250–400ms.
The second situation is the so-called “cold-start”. This is when your lambda is invoked for the first time. In this case, we noticed increase of lambda@edge duration up to 1.2s.

CloudFront Cache — Controlling the CloudFront cache statistics we noticed a small impact on a number of viewer requests for which the object is served from a CloudFront edge cache. During the day when A/B test config is modified, the number of “Hits” dropped by 2–5%. This is expected behaviour as modifying test config timestamp immediately invalidates previously cached requests. However, during the next day, the number of Hit from CloudFront returned to previously stabilized value.

Costs

Infrastructure cost of implementing an A/B testing solution on top of Lambda@edge is strictly related to the traffic on your website. The current cost can be found in AWS docs. However, assuming that you have ~10 Million visits per month and the average duration of Lambda@edge is below 50ms you will pay approximately $10 per month.

--

--

Przemyslaw Falowski
TUI Tech Blog

Born in Poland, living in Italy Software Engineer mostly focused on Frontend technologies. Tech Lead Frontend Developer @ TUI Musement. www.przemkow.dev