[Part 3] Accelerating Load Times: A Materialized View and Server-side Composition Case Study

Yedidya Schwartz
OwnID Engineering
Published in
7 min readApr 19, 2023
Photo by Spencer Davis on Unsplash

In the two previous parts of this series, I introduced the product, the problem, and the design patterns that will be used to solve the problem. I wrote about the objectives that will define the solution as successful. Then, I started to describe the steps taken to apply the design patterns to our architecture. I described Redis as a resources cache layer, the baking-server, and CloudFront as the baked files CDN.

In this article, the last in the series, I will explain how we verified that the CDN solution works as expected, and then I will explain the last part of the architecture, where Redis pub/sub joins the party.

This is part 3 (and last) of this article series, I highly recommend to first read part 1 and part 2.

How to Verify Our Architecture Stands the Load

The main load of requests is handled by the CDN, and our backend doesn’t suffer from any special peaks, even in the rush hours.

To verify it we used two tools: CloudFront reports in AWS console and Datadog Distributed Tracing.

I chose one baked file of a specific customer for this analysis, and started with checking its statistics with CloudFront reports.

Figure 1: Popular objects page of CloudFront reports in AWS console

As can be seen in figure 1, the file was requested in the last seven days more than 1.8M times. The “Hits %” column shows the percentage of the requests that were served from the CDN and were not sent to the origin (baking-server). As we can see, 99.88% is a good result for cache hits — only 253 misses. This means that in one week, the baking-server had 253 requests for baking this specific file.

To make an extra check, I used our observability tool, Datadog, which collects data of any performed request on our backend components, to see how many requests were performed to the baking-server for this specific customer.

Figure 2: Datadog traces, filtered on baking-server for a specific file

Indeed, very few requests were performed to the specific file in the same time window, and the amount is similar to the data that appeared on the CloudFront report: 249/253.

So, approximately 250 requests to the baking server compared to almost 2M requests in total.

This is exactly the verification we wanted to get, so we are good to keep on, to requirement number 4 from the previous article: the response content should always be with the most updated data.

Redis Pub/Sub as Invalidation Trigger

The last step of making the baking mechanism work properly is to handle the CDN invalidation logic.

Each time the SDK code or the localization files are updated, the CDN needs to clear from its edge locations the current versions of all customer’s baked files, so the next request to the file will fallback to the origin — the baking server — and will re-bake everything with the most updated data.

Each time a customer updates their own app settings in the management console, the CDN needs to clear from its edge locations only the specific file of the customer, so the next request to the file will re-bake the new settings to the file, using the baking server.

Actually, the invalidation processes can be split into two types: a global invalidation and an individual invalidation. To implement this functionality, we chose an event-driven approach, using Redis pub/sub.

  • For the global invalidation: The CD processes of our SDK and localization projects are publishing a “global invalidation” event. The config-server is a subscriber of this event, and once the event is received, the server sends an invalidation request to CloudFront, for the following pattern: “/sdk/*”, to clear all baked files.
  • For the individual invalidation: Once a customer updates their app settings on the management console, an “individual invalidation” event is triggered, with the customer’s App ID as a variable. The config-server is a subscriber of this event as well, and once the event is received, the server sends an invalidation request to CloudFront, for the following pattern: “/sdk/<app_id>”, to clear only the specific apps’ baked file from the edge locations.
Figure 3: The architecture with the Redis pub/sub component, to handle CDN invalidations

This part is a suitable solution for requirement #4: the response content should always be with the most updated data, with the last settings that were configured by the customer in the management console, with not more than one to two minutes of delay. (For example, if the widget color was changed from red to green, in max two minutes all end-users will start seeing a green widget.)

So now that we’ve marked all the required bullets, let’s test the result.

The Result

After the solution is ready and works e2e, we can test the performance of our widget loading using the baked file. In order to test it, I updated the settings of one app in the management console, to invalidate its baked file from the CDN, and then loaded a page where the widget exists.

Figure 4: Network tab screenshot of the request being performed a few times for the widget load after the fix

As can be seen in figure 4, I loaded the page four times to see the latency of a few identical requests. The first request took almost 200ms, as the required baked file didn’t exist on the CDN yet, and the baking-server had to create the file.

The next requests took 31, 14 and 17 ms, much faster than the first request, as the files are served from the CDN edge locations.

And if we compare this latency to the latency of the widget loading before the architectural change, the change is phenomenal: from more than 0.5 seconds to 15ms. An improvement of hundreds of percent in the latency.

The architectural change I described in these three articles significantly improved the experience of our widget’s loading on our customer’s websites. We gradually migrated our customers to use the new mechanism, to mitigate the risk in case we found any critical bugs.

After a few weeks of using it, we can also verify the quality of the solution, as we haven’t yet experienced any performance issues.

Summary

In this article series I described the main implementation steps we took to boost the product loading performance. The solution was inspired by two known design patterns: the Materialized View and the Server-Side Composition. The solution incorporated aspects of the both design patterns: gathering complex data from different sources for storing it in one place for an efficient querying, as well as pre-building the UI on the backend, rather than piecing together multiple sources on the frontend.

We started with Redis as a resources cache layer on the backend, to keep the files close network-wise. This saved us from requesting the resources from S3 or other servers, an action that would increase the required time to read them.

Then I explained about the baking server — the core service of the architecture — who is responsible for reading all the required resources and merging them into one JS file.

The next step was the CDN, which acts as the baked files cache layer. I mentioned two cool behaviors of CloudFront — requests collapsing and origin shield — and I explained how to verify that the CDN acts as expected, using Datadog and CloudFront reports in the AWS console.

Lastly, I introduced Redis pub/sub, which is responsible for orchestrating the CDN invalidation events, to make sure the served files are always the most updated ones.

Before and even during the implementation, we had a lot of questions about the best architectural decisions to make, in many parts of the solution. Some examples are:

  • Where should we trigger the CDN invalidation event?
  • Should the baking-server store the baked files in S3 to prevent overload of the server, or the CDN is good enough as the cache layer?
  • If we go with the S3 approach, there’s a huge overhead when we need to invalidate all files and upload them all at once to S3. Is this heavy operation worth it?
  • Should we use an extra cache layer, in addition to Redis, and store the resources in an in-memory cache in the baking server?
  • And many more…

In the end, we believe we’ve made the right decisions under the circumstances of our existing architecture. One thing I can tell for sure: the thinking process was very interesting. The main conclusion is that even when you are 100% sure you have the full solution in a diagram before you start with the implementation, you’ll have some changes during the work. You just need to hope they won’t completely devour all the cards…

Did you learn something new from our use case? Feel free to share your thoughts and questions.

You are welcomed to subscribe to OwnID Engineering newsletter to get updates when new articles are published.

--

--

Yedidya Schwartz
OwnID Engineering

Backend Tech Lead | DevOps | AWS Community Builder | AWS Solution Architect