Improving Traveloka Web Performance
Editor’s Note:
Fatih took over the baton from the preceding article on Traveloka’s web performance culture (penned by Ryan Nixon Salim) to share with us today the accompanying technical optimization techniques governing Traveloka’s site that you can utilize as an implementation reference taking into account the nature of distinct constraints that individual environment has.
Fatih Kalifa is a web infrastructure software engineer, whose responsibilities include developing web framework, improving web performance, as well as collaborating with the design team to enhance system design, tooling, & workflows.
In the previous article, we’ve discussed how we built a performance culture in Traveloka. This article is a technical follow-up on how we improve our web performance using various techniques.
Before going further, I’m going to lay out a few constraints that we have in Traveloka that may affect how you approach and solve performance problems on your end. Different constraints might result in different solutions. So, make your decision based on the constraint that you currently have.
With that said, here are our four constraints:
- Server-Side Rendering (SSR), not static HTML. We render content in our node.js service with the data we get from the backend.
- High volume of server-side API calls. Typical pages have at least six API calls.
- Backend-defined views. We have a few pages, where the contents are defined exclusively in the API response.
- Multiproducts monorepo. As of this writing, we have 30+ products deployed into 20+ separate services inside a single repository. Some of them are still deployed in a single monolith.
General Optimization
These are the kinds of applicable optimization that improve performance across Traveloka’s pages. As you’ll see later, we have another approach that only applies to specific products or features.
The first optimization that we did was to switch to Brotli for a better compression algorithm. In our testing, Brotli compressed 15–40% smaller than its gzip counterparts on the same files. We also noticed a 15% Lighthouse score improvement average across our pages.
Optimizing compression also has a nice side effect on bandwidth usage in our CDN. We’re typically serving a few hundred terabytes of static assets in our Cloudfront a month. With 23% less bandwidth, the cost saving is definitely noticeable.
Considering the implementation itself is fairly straightforward, it’s a nice win.
Brotli for Dynamic Data
We generate HTML at runtime instead of serving static files. We also have a separate API proxy service that forwards requests to different backends serving JSON data. We want these services to have a small response with acceptable latency.
Because HTML and JSON are dynamically generated, we have to carefully balance between CPU usage and compression performance. Based on our experiments, the sweet spot for Brotli’s settings is at quality=4
(out of 11) and lgwin=20
(out of 24) at runtime. Such configuration results in 20% reduction in response size with barely noticeable increase in CPU usage compared to gzip.
We use Express to serve those data. But unfortunately, since its compression library hasn’t supported Brotli yet, we use an implementation from one of the PRs.
Brotli for Static Assets
There are various methods that you can use to serve static assets with Brotli. If you’re using AWS CloudFront, Brotli is now supported. In our case, however, due to the lack of official support during our implementation back then, we decided to use Lambda@Edge, CloudFront, and S3 until today. Using Lambda@Edge handler, we can rewrite the request based on the request header to serve either Brotli or gzip. This is important because not all browsers support Brotli.
First, we create two static assets in our Continuous Integration (CI); normal (uncompressed) and Brotli-compressed (with .br extension) using the highest settings for quality
and lgwin
at 11 and 24 respectively as we also do compression at build time. Then, we upload them to an S3 bucket that is already configured to be served using CloudFront with the correct header such as Content-Encoding
.
The next step is to setup Lambda@Edge to handle Origin Request
events in CloudFront, add Accept-Encoding
to whitelisted headers, and set cache behavior to follow whitelisted headers.
The logic in the Lambda function itself just reads the current HTTP request, checks whether the Accept-Encoding key in the HTTP header contains a value of br, and appends a .br
to the HTTP request URL before forwarding it to S3.
The cost associated with executing a Lambda function is practically zero because it doesn’t get executed twice due to the presence of cached assets in CloudFront that inactivates an Origin Request
event. Consequently, we are able to reduce the average file download size by up to 40%, with the greatest reduction in JavaScript assets.
Bundle Optimization
Another general optimization that we implemented was in the bundler / package manager level. This approach might be slightly different per company as the tools that we use have different challenges or quirks depending on how they are used. This optimization is purely focused in reducing the bundle size.
For additional context, we use Yarn (over npm) as our package manager of choice for its superior support for monorepo workflow. We also use webpack and Babel as our bundler combo.
The most important step in optimizing bundle size is running “bundle & coverage analysis” to find unused parts with tools such as webpack-bundle-analyzer and Chrome DevTools Coverage tab. In our experience, bundle problems mainly arise from either tree shaking issues or premature assets loading.
Tree Shaking Optimization
One common tree-shaking issue can be caused by a non-major dependency version mismatch. With monorepo using Yarn’s workspace, instead of packages having individual copies of the same dependency, a single copy is hoisted (moved) to their relative root directory and shared. In cases where such sharing arrangement doesn’t work as intended, you’ll see your code loads the same dependency twice in the bundle analyzer.
To fix this duplication, you can use yarn-deduplicate to make sure that the generated lockfile is acceptable. You can also add additional safeguard by using require.resolve
module method and resolve.alias
module option in webpack config in order to resolve dependencies path before (not when) any module traversal happens so that each reference to the dependencies will always refer to the same node_modules
path.
We also use this approach for dependencies, where we have different file names as entry points such as index.web.js
that contains ES import, but index.js
itself uses CommonJS. Even after adding resolve.extensions
module option, webpack still incorrectly tree-shaked this module.
Those approaches might work to some extent. But as a last resort, you can use Yarn resolution to force dependencies inside your project to be a single version. This approach can also be useful if you use private dependencies in multiple packages, where you only need to specify a wildcard (*
) as a dependency requirement in your package.json
file and set the actual version using the resolutions
field in root package.json
.
Another thing you need to check is whether your dependencies set sideEffects: false
in package.json
as a directive for webpack to create a more optimal bundle. If they don’t and you notice that the result from that package is suboptimal, you can force it using a webpack config. You just need to be sure that it really is side effect-free.
We also noticed misconfiguration of preset-env
and node_modules
transpilation in webpack config could cause improper tree-shaked modules. We carefully tested a combination of webpack include and exclude module rules, and used oneOf module option instead of multiple rules to make sure that each transpilation config targets specific module paths with no overlap. Such configuration reduced up to 180kB and 124kB on average for mobile web entry points.
The last optimization we made was disabling the Buffer
polyfill in the webpack config, which saved around 20kB.
Resource Loading Optimization
This technique is commonly referred to as code splitting but it’s usually more nuanced than a simple change from static import to dynamic import. The goal of this optimization is to strike a balance between fast first load and quick future interactions. Similar to all previous approaches, it’s always better to run “bundle & coverage analysis” as well before doing any performance optimization.
The goal of this optimization is to strike a balance between fast first load and quick future interactions.
The way you optimize for assets loading depends on how you structure your app. Here’s two examples from our use cases:
- We don’t fetch A/B test libraries (including the config from API) if no experiment is running. This is possible because we statically define experimentation in the route level.
- We fetch heavy client-side navigation logic in parallel with the succeeding Next.js route components. Leveraging the fact that a promise object is immutable, we create a single instance that is awaited during client-side navigation but fetched inside Next.js routeChangeStart events.
Despite fixing some of the performance issues that we’ve found so far with optimization techniques through those two use cases, optimization is still a continuous effort and we’re thankful we already have implemented a good system in place to monitor web performance.
Framework-level Optimization
Sometimes, improving existing apps can only get you so far and this is also the case for us. Our legacy service, while working well, still has suboptimal performance even after various optimizations. Consequently, in addition to some other technical reasons, we decided to build an entirely new SDK for our customer-facing sites (codenamed TVLK5). With this scheme, we observed a 30–60% improvement across all lighthouse metrics on top of already optimized pages using previous approaches.
There were a lot of changes in TVLK5 that may warrant another post but here are four approaches that contributed to better overall performance:
No External CSS
We stop using external CSS files via CSS Modules as it’s tricky to work with in some code splitting scenarios due to how style rules are applied based on insertion.
We replace it with a single critical style that is generated on server-render time. This new approach allows us to code-split the components with more confidence even with JS disabled.
There’s no exact number of how much this method alone improves performance as we’re also rewriting our components using the new design system from scratch. Based on unscientific measurement, it contributes to a significant improvement in the Speed Index alone.
We’re also able to move away from extracting optional critical styles at build time as not only is it prone to flaky results, but it’s also hard to set up & maintain.
Native Web APIs Utilization
We promote the use of standard browser API instead of npm modules such as Intl.DateFormat
and Intl.NumberFormat
over moment or date-fns and accounting respectively. We also add selective polyfills for Intl APIs using polyfill.io.
To support these efforts, we provide guaranteed interoperability between our API response and browser API input by using BCP47 language tag and ISO 4217 country code. We also make sure that our docs show an example of using these standard APIs.
Framework-specific Bundle Optimization
We provide alternative methods to improve web performance even more on a framework level in TVLK5 by initially adding the following two module loading patterns:
- Progressive Hydration. Server-side rendered component that’s only loaded & hydrated after it’s visible on the screen using IntersectionObserver API.
- Static Subtree. Server-side rendered component that doesn’t create client-side JS bundle, yet it still works on client-side navigation.
Another small change that changed the development way of TVLK5 was adding support for .client.ts
file extension.
As our app is developed universally (for both server and client) using React components, having a special file extension for browser bundles makes previously tricky patterns become manageable.
In fact, both module loading patterns above are implemented using that special file extension.
API Call & SSR Logic Refactor
Previously, we fetched all API calls needed to server-render our site serially partly due to how we leveraged self-contained Express middleware. Over time, our Time to First Byte (TTFB) worsened because of this pattern.
Figure 8. Previous critical rendering path blocked by series of API calls.
As we rebuilt our site from the ground up, we identified the API that can be fetched in parallel. We also deferred some APIs to be fetched on the client-side to reduce server load even further.
Figure 9. Critical rendering path in TVLK5.
As an additional measure, we also added a few Server-Timing headers for better visibility of our TTFB breakdown. We’re also currently experimenting with API preload so that critical client-side API calls don’t need to wait for JS and React Hydrate to start fetching.
Above the Fold Optimization
While in the previous two sections, we’ve discussed global and framework-level optimizations that mostly happen in the “backstage”, we also have “frontstage” optimization that’s catered specific to a particular product or feature.
This optimization involves our third constraint: the Backend-defined Views (that we mentioned in the introduction section). The most common use case is to display an image carousel with different layout configurations.
As you probably notice, pages with this functionality (and layout) instantly become image-heavy pages. Considering image as the top contender for performance bottleneck, you can imagine the UX degradation in this scenario as it’s used as above the fold content.
To make matters worse, this is client-side rendered, which means users have to wait for TTFB, JS load, API call to finish prior to image rendering — four steps with varying degrees of bottlenecks.
“Smart” Server Rendering
The first change that we made was to customize the rendering behavior. Switching to SSR is not as straightforward as you might think because being developed client-side first means the way we usually implement layout (based on configured aspect ratio from the backend) assumes we have preliminary access to viewport width in JS.
In SSR, this assumption no longer works and we had to rewrite those calculations to use CSS calc
function and abstract an AspectRatioContainer
component to polyfill aspect-ratio
CSS feature.
Next is balancing between slow API calls with Largest Contentful Paint (LCP) metric and Speed Index. If we wait until the API call is resolved, the user might stare at the blank screen for a while. To prevent such occurrences, we provide a timeout to server API calls and render UI skeleton if it passes our threshold and continue fetching and rendering in client-side.
That is something that React Suspense SSR can help but unfortunately, we have to settle with a few boilerplates as it’s not currently supported yet.
Image Optimization Checklist
It’s important if you have an image-heavy page that you use an image-serving proxy or CDN whose main benefits are resizing images on the fly (since the page is rendered dynamically based on screen width) and serving next-gen images without worrying about browser support. There’s a few paid services like Cloudinary, Imgix, or ImageKit. You can also host your own service.
A non-CDN consideration is Device Pixel Ratio (DPR). Image-serving CDN has the ability to serve dynamic images with the best quality possible to matching devices without making a tradeoff between quality and file size. Of course, it’s trickier if you consider SSR. But in most cases, you can get away with choosing the default DPR value based on the common device pattern that you see in your analytics.
Adding preload also helps to prioritize image loading. The key to preloading images is balancing the number of specified preloads with the network limit. In our case, we only preload the first 2 images in the first 3 sections of any page. This value is calculated based on the common screen size and UI configuration that are typically served to the users.
To avoid double fetching, you have to make sure all preloaded images use the same DPR value (or any image query)
Lastly, consider setting good defaults for your image component. For example, we set importance="low"
and loading="lazy"
by default so that we only need to explicitly configure them when we want a different loading strategy. In our observation, using JS-based lazy-loaded images resulted in a relatively poor performance. So, we opt to use native API without polyfill. But your mileage may vary in your own environment.
Other SSR tips for better LCP
When you’re doing server-side rendering with universal (or some people call isomorphic) approach, it’s possible to have different rendering methods for server and client, whose component depends on browser-specific API. However, if you’re not careful, you might end up with a lower performance score than before or worse, a broken or unintended rendering behavior.
In general, you should avoid relying on typeof window expression directly in render and use component lifecycle or effects instead to determine which environment such component renders in. The Perils of Rehydration explains an issue perfectly when there’s a mismatch between server & client.
We had stumbled upon a peculiar case as well (which is understandable given our unusual approach as followed). In our app, we wrap the root component with React.Suspense API because there are a few other APIs that leverage Suspense for data fetching on the client-side. The problem arose due to our utilization of Next.js, whose code was also used for server-side rendering and Suspense is not yet supported in SSR.
Initially, we worked around this roadblock by using a special client extension that replaced React.Suspense
API with React.Fragment
on the server. It turned out this approach caused a severe performance issue albeit oblivious to the UX (at least based on our testing) as it caused React to trigger repaint on hydration, which in turn, caused LCP timing to be recalculated even though there was no visible changes in the screenshot timeline before and after LCP was measured (we even compared those two images pixel by pixel).
We finally fixed this issue by removing the top level Suspense and wrapping other Suspense boundaries using react-no-ssr
wrapper. This approach decreased our LCP timing from 4.8s to 0.8s. A massive improvement in the Lighthouse score and yet, no noticeable difference in the actual user experience.
User Experience > Lighthouse Score
Finally, I want to end this post with a note that UX triumphs over any metric any day. Lighthouse score should only be used as a proxy and not as an ultimacy (to get a perfect score by sacrificing UX or worse, by cheating in measurement).
Let’s revisit the previous case of Backend-defined Views performance improvement that can be seen below in Figure 12:
And compare it to the after-version in a similar time slice below in Figure 13:
Can you guess the score differences between the two? Not that much actually. They basically have the same score.
But which is actually better in terms of user experience? Due to the difference in perceived performance, I’d say it’s the second one because I could see the contents displayed faster in the second approach.
With more tools available for us to measure, monitor, and improve web performance, we should always remember that ultimately what we need to improve is the user experience. Perceived performance is harder to measure and yet reflects more of the actual user experience in using your sites or apps.
As you rely on Continuous Integration (CI) to prevent any regression of your web performance, you also have to keep testing it using real and most commonly used devices and see how the experience feels.
Recap
Techniques to improve web performance vary depending on the constraints that you have but the fundamental principles stay the same: reducing bundle size to the minimal, deferring loading unused resources on the first interaction, loading the right resources just-in-time as they’re needed, preparing optimized loading strategy for further interactions, and utilizing predictive loading to make screen transitions faster.. Moreover, always think as a real user and prioritize contents that matter most to them.
Besides UX and application architecture, you also need to have a good knowledge of bundler and its ecosystem. The flexibility and diversity of the JavaScript ecosystem means it’s hard to find the right tools that perform well for all scenarios without any adjustment. Sometimes, you need to get your hands dirty debugging and configuring bundler to get optimal results. Tools like webpack-bundle-analyzer, Chrome DevTools JS coverage, or source-map-explorer can help you find spots to maximize improvements.
The most important thing is to have a good culture in place where engineers care about web performance and have the right tools to warn the metrics you care about of any regression. These are some of the ways you can keep improving your web performance as you develop more features or even new products.
If web performance or system improvement excite you, check out the opportunities on Traveloka’s careers page. We’re continuously looking out for talents to help improve our user experience through performance optimization.