Server-side rendering IKEA.com with Edge Computing

Published in

Flat Pack Tech

10 min readJun 20, 2024

In my previous blog post, I wrote about the problems with static content.

What we want to achieve is rapid and reliable feedback that our changes to IKEA.com are also changing our metrics — and do not change other metrics negatively (side-effects). The practice of A/B testing (also known as randomized control trials) is a reliable and scalable method to introduce causality between changes and effects.

We also want to achieve a more relevant website for our visitors, by returning different experiences for them depending on their identified needs/context — a practice called personalisation.

The conclusion of the previous blog post is that implementing A/B testing and personalisation with client-side JavaScript on top of static content introduces problems with web performance and experimentation data quality (including attrition bias, which was not mentioned in the previous post). Instead, we have started a transformation of IKEA.com towards dynamic content using server-side rendering. In order to achieve server-side rendering at scale — both globally and by multiple teams — we’re using Edge Computing.

Note that we are relatively early in the transformation process from static web to dynamic web. That being said, we hope this post will still provide some valuable insights.

Server-side rendering
Edge Computing
Data for the variants
Caching: Introducing Edge Variant Service
Local testing

Server-side rendering

With server-side rendering, we’re generating HTML on-demand in a web server. While static content is also technically speaking generated server-side, it’s not driven from browser navigations — static content is push, while server-side rendering is pull. Rendering on the web is a great article that contains a terminology of web rendering.

IKEA.com is using Edge Side Includes (ESI) to implement modularity in the web architecture, decoupling the teams’ deliveries from each other and thus enabling Continuous Delivery for the many teams. ESI composes the page template and its dependencies (fragments) into a finished page, which we call the composition. We call requests for page templates and fragments internal HTML requests.

What we want is the ability to return different HTML for page template and fragment routes depending on A/B testing buckets and customer data. Let’s wait a bit with how we get that data and just assume that we have it available when rendering happens. Then we could write code like the below examples (I use Jinja-like pseudo code in the examples, for simplicity).

{% if experiments['navigation'] == 'next-nav' %}
  <next-nav>
{% else %}
  <nav>
{% endif %}

{% if customer.isReturningVisitor %}
  <esi:include src="[welcome-back-url]" />
{% else %}
  <esi:include src="[welcome-new-url]" />
{% endif %}

Welcome back {{ name }}

We call the different HTML responses for the same route variants, because they vary over one or several variables. In the cases of A/B testing and soft personalisation, the number of variants for a single response type is relatively small, while the number of variants for individual personalisation is large.

A diagram showing two actors: New visitor and Returning visitor. A box labelled Variant A data has an arrow to a box Variant A, which has an arrow to New visitor. A box labelled Variant B data has an arrow to a box Variant B, which has an arrow to Returning visitor. — Different visitor types get different variants, based on different variant data

Assuming a high cache hit ratio without variants, the cache hit ratios for internal HTML requests have an inverse relationship to the number of variants: when the number of variants is low, cache hit ratio is likely high, and vice versa.

After the HTML response is rendered by the browser, a couple of things can happen. Either:

Nothing, since the interactive elements in the HTML are links or forms
One or several JavaScript applications get rehydrated (see Rendering on the web for details). ESI together with rehydration (possibly lazy rehydration) is similar to an Islands architecture.
(In theory, a hypermedia JavaScript library like htmx could also be used instead of rehydration)

IKEA.com serves customers from 50 countries around the world and we want each of our customers to have a great experience, in turn requiring great web performance. However, when starting to server-side render HTML, we need to think about where we do that. The simplest way for an engineer would be to render in a single server location, but that would break the web performance for customers far away from that location. The optimal distribution for our customers would be to render server-side as close to them as possible. Thus, we want to have variants and great web performance. Enter edge computing.

Edge computing

My colleague Debapriya has previously written about edge computing. In the context of IKEA.com, it means to be able to distribute compute close to our customers.

However, acquiring the capability of edge computing introduces questions around how we do that:

Should our teams build edge computing themselves — distributed build?
Should we instead build a central edge computing platform in-house — central build?
Should we buy the capability from a vendor — central buy?

Leaving some details aside, we today offer a central edge computing platform to our teams, reducing their cognitive load so that they can focus on solving customer problems.

This means that our teams can treat the network of compute locations as a single global serverless solution. For example, when they deploy code, it gets deployed globally. And when they observe metrics about their application, the metrics too are global by default (but can be drilled down into for example regions).

In summary, enabling our teams to deploy their server-side rendering code to an edge computing platform creates the capability to render the different variants close to our customers, which in turn gives us what we want: variants and great web performance.

But this introduces another challenge: where does the data for our variants come from?

Data for the variants

If we would render variants at the edge but have our data in a central location, we would gain very little since the edge servers would have to wait for the central API requests to return until they could render the HTML. So it’s crucial that the necessary data is also available close to our customers. Depending on the data and use-cases, there’s different trade-offs and patterns. The CAP theorem (or PACELC theorem) comes into play, with architectural patterns like CQRS and eventing/messaging. A more in-depth description is unfortunately out of scope for this blog post.

For both A/B testing and personalisation, there needs to exist an identifier of the visitor or customer, like a session cookie. If there’s no existing identifier, it needs to be created server-side and before any dependent functionality is executed — otherwise the customer would experience inconsistencies between their first page view and subsequent navigation.

A/B testing

A good hash is hard to find is a great article on how A/B testing bucketing can work. Given an identifier from a cookie, the only additional data we need is configuration data about the experiments themselves. Thus, the data set for A/B testing bucketing is relatively small and easily cacheable — great! Also, if the number of simultaneous A/B tests for a module is low, the number of variants will be low and thus cache hit ratio will be high for the internal HTML requests.

A labelled arrow Identifier goes to a box labelled A/B bucketing from the left. From that box, the are two arrows. A downward arrow labelled fetch goes to a cylinder labelled Configuration and a rightward arrow goes to a box labelled Variant data. — Based on an identifier, the A/B bucketing reads configuration, performs the bucketing and returns variant data

However, if experiments also can depend on customer data, that data needs to be available close to the customer.

Soft personalisation

In soft personalisation, we personalise based on customer groups. Members in a group share one or more properties. Some of these properties are derived properties and can be calculated offline and in batch. Properties can be persisted on smaller and distributed customer objects — similar to the CQRS pattern. Regardless, the customer data set is large, and not cacheable between customers.

Thus, in order to have the necessary data for creating the HTML variants for soft personalisation, we must request the customer object containing this data — there’s no way around this. But these requests can be cached within a session, making the system a bit more efficient.

A labelled arrow Identifier goes to a box labelled Personalisation service from the left. From that box, the are two arrows. A downward arrow labelled fetch goes to a cylinder labelled Customer data and a rightward arrow goes to a box labelled Variant data. — Based on an identifier, a personalisation service reads the customer data, performs the personalisation and returns variant data

The good news is that if the number of properties that we personalise on is low, we have the same situation as for A/B testing with high cache hit ratio for the internal HTML requests.

Individual personalisation

Individual personalisation shares some of the concepts from soft personalisation, but the main differences from our perspective are that the variants are not cacheable between customers and that there’s often more compute involved within the request chain, for example recommendation engines.

It’s outside the scope of this blog post to further expand on individual personalisation. Instead, let’s dive deeper into caching.

Caching with Edge Variant Service

From a cache key perspective, we need to find how to collapse the cache key space from individual (i.e. session identifier) to variants. Also, we need to figure out where to cache.

Edge Side Includes and Caching

In a web architecture with static delivery, caching the ESI composition works well. However, this model breaks down when introducing variants for the internal HTML requests. Three years ago, we disabled caching of the ESI composition, while keeping the cache for the internal HTML requests.

Looking at ESI and caching of variants, we saw an opportunity: if the layer that produces the ESI composition is not cacheable, what if we could also do the cache key collapsing here? Then we could have a single uncacheable layer and still let the underlying layers be cacheable (except for the customer data requests).

We started to develop a service that we call the Edge Variant Service (EVS). EVS is responsible for forwarding the necessary information to the HTML rendering edge services. To know what data to forward, we rely on configuration. For example, the configuration tells us which A/B testing projects belong to which internal routes and what other data they need, i.e. customer data properties and cookies. This way, the HTML rendering services only get a minimal amount of data and we can maintain as high a cache hit ratio as possible.

Important to note is that the ESI resolution and configuration look-ups need to be tightly integrated, i.e. before a fragment is requested it needs to be decorated with additional context. The context comes from A/B testing bucketing and/or soft personalisation bucketing, which we can generalise to variant data generators.

A labelled arrow Page template goes into a box labelled ESI execution. An arrow labelled Fragment requests is going out of the ESI execution box. A smaller box labelled Hook is inside the ESI execution box. From the Hook box, there’s an arrow going to a box labelled Variant data generator. — Before a ESI fragment gets fetched, the variant data generator is called and additional context is added to the request

Cache key design: Translate data to query strings

When caching HTTP requests in a CDN or cloud, the simplest and default approach is that the URL is the cache key (with or without query string). Therefore, we translate all information that should be passed to the underlying HTML rendering services to query string parameters, i.e.

A/B testing buckets
Customer data properties
Cookies
Request headers

An example internal HTML request query string could look like this:

?navigation=next-nav
?isReturningVisitor=true
?customerName=Gustaf (not cacheable between customers)

This way, we can continue to have a simple cache configuration at the cost of a more complicated mapping logic. This is a good design trade-off, because the mapping code is easier to test and requires less specialised competence compared to cache configuration. However, the length of the URL needs to be monitored so it doesn’t exceed the platform limits.

Local testing

In order to get a fast feedback loop for our engineers, we need support for local development, i.e. that an engineer can run a service on their own computer, including EVS. Local development support is an important non-functional requirement for the central edge computing platform.

Testing in a microfrontends architecture shares similar challenges as in a microservices architecture. Unless we re-introduce coordination of releases (and we don’t want to do that), there’s no guarantee that the system you’re testing is the exact system you’ll find in production, because things might change between when you test and when you release. Instead, we acknowledge that there’s a trade-off between speed and thoroughness (see the ETTO principle), that local testing gives us speed, and that the only environment that ultimately matters is production. There’s an excellent article related to this topic: Testing in Production, the safe way by Cindy Sridharan.

Since EVS is a key component for ESI composition, A/B testing and personalisation, we needed to figure out a way to distribute it for local development. We also needed to distribute configuration tailored to local development needs, so that an engineer can test their code together with our internal test site (or even the production site). Today, we distribute EVS and its configuration as an internal npm package. It’s also possible for engineers to override configuration entries if multiple modules are being worked on or are needed for other reasons.

Summary

In order to solve the challenges with static content described in my previous blog post, we’ve started to server-side render HTML variants using Edge Computing. The variants need data, which comes from A/B testing configuration and customer data. Fetching customer data can easily become a performance issue, and here the CQRS pattern can help.

To cache the variants, we have built a service called Edge Variant Service (EVS), which also is responsible for ESI resolution.

Local testing is important and supported in the platform, together with a distribution of EVS and configuration for local testing needs.

Big thanks to Lukas Vermeer (Vista), Martin Gudmundsson, Robin Whittleton, and Mats Johansson for reviewing this blog post and providing valuable feedback.